Researchers in the social sciences and beyond are dealing more and more with massive quantities of text data requiring analysis, from historical letters to the constant stream of content in social media. Traditional texts on statistical analysis have focused on numbers, but this book will provide a practical introduction to the quantitative analysis of textual data. Using up-to-date R methods, this book will take readers through the text analysis process, from text mining and pre-processing the text to final analysis. It includes two major case studies using historical and more contemporary text data to demonstrate the practical applications of these methods. Currently, there is no introductory how-to book on textual data analysis with R that is up-to-date and applicable across the social sciences. Code and a variety of additional resources are available on an accompanying website for the book.
6.1 Lexicons of Sentiment-Charged Words
Sentiment analysis is the process of assessing the subjective opinion expressed in the data set by contextually mining the text. Sentiment analysis is typically carried out by reference to lexicons, which are comprehensive lists of words in which each word has been assigned a positive or a negative sentiment. The Harvard General Inquirer (HGI) website provides several such lexicons. With lexicons, a document’s words can be matched with the sentiment-charged words in the lexicon, and you can then determine the number of words that carry a positive sentiment and the number of words that carry a negative sentiment for either the entire corpus or for each document separately. This method is the predominant method in that it ...