Researchers in the social sciences and beyond are dealing more and more with massive quantities of text data requiring analysis, from historical letters to the constant stream of content in social media. Traditional texts on statistical analysis have focused on numbers, but this book will provide a practical introduction to the quantitative analysis of textual data. Using up-to-date R methods, this book will take readers through the text analysis process, from text mining and pre-processing the text to final analysis. It includes two major case studies using historical and more contemporary text data to demonstrate the practical applications of these methods. Currently, there is no introductory how-to book on textual data analysis with R that is up-to-date and applicable across the social sciences. Code and a variety of additional resources are available on an accompanying website for the book.

Modeling Text Data: Topic Models

Modeling text data: Topic models

9.1 Topic Models

The fundamental starting point for our text analysis has consistently been the frequency distribution of different words from all documents in the text corpus. We began the analysis in Chapter 4 by visualizing this overall word distribution. Next, in Chapter 5, we looked for useful stratifications of this word distribution and assessed whether word distributions change across documents or with the document’s metainformation (that is, time, place, speaker, author, and recipient). Let us now suppose that you have a very big text database but very little idea what the text is all about and not enough time to read enough of the documents to get more than a hunch. You probably have a ...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles