Researchers in the social sciences and beyond are dealing more and more with massive quantities of text data requiring analysis, from historical letters to the constant stream of content in social media. Traditional texts on statistical analysis have focused on numbers, but this book will provide a practical introduction to the quantitative analysis of textual data. Using up-to-date R methods, this book will take readers through the text analysis process, from text mining and pre-processing the text to final analysis. It includes two major case studies using historical and more contemporary text data to demonstrate the practical applications of these methods. Currently, there is no introductory how-to book on textual data analysis with R that is up-to-date and applicable across the social sciences. Code and a variety of additional resources are available on an accompanying website for the book.
1.1 Text Data
Text data is basically verbal expression and communication. Words are how most humans form thoughts and communicate them. Text analysis is particularly useful for big data, either thousands of documents, or thousands of words in each document, or both. When the mass of information exceeds what a reader can read comprehensibly, computer-based, algorithmic text analysis using the internet and other technologies is particularly valuable. These big data records, comprised of thousands of documents and millions of words, are now more common than ever before. In this book, we study the corpus of 8,500 letters of the Territorial Papers, and another corpus consisting of more than 100,000 speeches given by members of the 39th United States Congress. Constructing larger and still larger ...