Researchers in the social sciences and beyond are dealing more and more with massive quantities of text data requiring analysis, from historical letters to the constant stream of content in social media. Traditional texts on statistical analysis have focused on numbers, but this book will provide a practical introduction to the quantitative analysis of textual data. Using up-to-date R methods, this book will take readers through the text analysis process, from text mining and pre-processing the text to final analysis. It includes two major case studies using historical and more contemporary text data to demonstrate the practical applications of these methods. Currently, there is no introductory how-to book on textual data analysis with R that is up-to-date and applicable across the social sciences. Code and a variety of additional resources are available on an accompanying website for the book.

A Description of the Studied Text Corpora and a Discussion of Our Modeling Strategy

A Description of the Studied Text Corpora and a Discussion of Our Modeling Strategy

A description of the studied text corpora and a discussion of our modeling strategy

2.1 Introduction to the Corpora: Selecting the Texts

For our study, we selected two particularly large corpora that document significant legal and political discursive processes in American history. These corpora were selected because they hold the promise to yield multiple avenues for research. These corpora had never before been fully digitized. We expected to find treasures within these troves merely by digitizing them for search possibilities. The extremely laborious process of digitization and then cleaning, described in more detail in the next chapter, would pay off if the effort could be expected to yield several more avenues to ...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles