Researchers in the social sciences and beyond are dealing more and more with massive quantities of text data requiring analysis, from historical letters to the constant stream of content in social media. Traditional texts on statistical analysis have focused on numbers, but this book will provide a practical introduction to the quantitative analysis of textual data. Using up-to-date R methods, this book will take readers through the text analysis process, from text mining and pre-processing the text to final analysis. It includes two major case studies using historical and more contemporary text data to demonstrate the practical applications of these methods. Currently, there is no introductory how-to book on textual data analysis with R that is up-to-date and applicable across the social sciences. Code and a variety of additional resources are available on an accompanying website for the book.

n-Grams and Other Ways of Analyzing Adjacent Words

n-Grams and Other Ways of Analyzing Adjacent Words

n-grams and other ways of analyzing adjacent words

10.1 Analysis of Bigrams

So far, we have focused our analysis on individual words. We used individual words to assess frequencies for interpretation and modeling, for the classification of documents in Chapter 8, and for fitting topic models in Chapter 9. We did not consider adjacent words or word combinations. That is exactly what we will do next. Single words combined in sequences add considerable depth to the meaning of the communication. Two adjacent words arranged in a certain order constitute a bigram. Consider, for example, word pairs such as “indian war”, “war veteran”, or “post office”. Note that word order is important; “post office” and “office post” are different bigrams. ...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles