Teach students how to construct a viable research project based on online sources. Gabe Ignatow and Rada Mihalcea’s An Introduction to Text Mining: Research Design, Data Collection, and Analysis provides a foundation for readers seeking a solid introduction to mining text data. The book covers the most critical issues that must be taken into consideration for research projects, including web scraping and crawling, strategic data selection, data sampling, use of specific text analysis methods, and report writing. In addition to covering technical aspects of various approaches to contemporary text mining and analysis, the book covers ethical and philosophical dimensions of text-based research and social science research design.

Web Scraping and Crawling

Learning Objectives

The goals of Chapter 6 are to help you to do the following:

  • Define the main techniques for web crawling.
  • Explore available software packages for automatically collecting textual data from webpages,
  • Compare web crawling and web scraping techniques.
  • Compare tools and supporting material available for web crawling and scraping techniques.

Introduction

In this chapter, we survey two categories of tools that are critically important for acquiring large amounts of digital textual data: web scraping and web crawling. Both can be accomplished with off-the-shelf software or in programming environments like Python or R. Crawling and scraping can potentially save huge amounts of time if the alternative is to manually scrape data from webpages. But they require an investment of time to learn how to use them, ...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles