Teach students how to construct a viable research project based on online sources. Gabe Ignatow and Rada Mihalcea’s An Introduction to Text Mining: Research Design, Data Collection, and Analysis provides a foundation for readers seeking a solid introduction to mining text data. The book covers the most critical issues that must be taken into consideration for research projects, including web scraping and crawling, strategic data selection, data sampling, use of specific text analysis methods, and report writing. In addition to covering technical aspects of various approaches to contemporary text mining and analysis, the book covers ethical and philosophical dimensions of text-based research and social science research design.
Chapter 13: Text Classification
The goals of Chapter 13 are to help you to do the following:
- Describe the task of text classification, its history, and applications.
- Follow the main steps involved in a text classification approach: feature representation and weighting as well as text classification algorithms.
- Analyze the inner workings of two classification algorithms: Naive Bayes and Rocchio classifier.
- Explore available data sets and software packages for text classification.
Whether you know it or not, you are likely reaping the benefits of text classification several times a day. Consider, for instance, the communication that you do by e-mail: More than half of the e-mails being sent at any given time are spam, yet your inbox probably does not see much spam. The reason for that gap is a spam ...