Summary
Contents
Subject index
Online communities generate massive volumes of natural language data and the social sciences continue to learn how to best make use of this new information and the technology available for analyzing it. Text Mining brings together a broad range of contemporary qualitative and quantitative methods to provide strategic and practical guidance on analyzing large text collections. This accessible book, written by a sociologist and a computer scientist, surveys the fast-changing landscape of data sources, programming languages, software packages, and methods of analysis available today. Suitable for novice and experienced researchers alike, the book will help readers use text mining techniques more efficiently and productively.
Web Crawling and Scraping
Web Crawling and Scraping
Learning Objectives
The goals of Chapter 3 are to help readers do the following:
- Understand the basic organization of the web and learn about estimates of its size.
- Learn about the main techniques for web crawling and scraping.
- Learn about available software packages for automatically collecting textual data from webpages.
The web—a common abbreviation for the World Wide Web—consists of billions of interlinked hypertext pages. These pages contain text, images, videos, or sounds and are usually viewed using web browsers such as Firefox or Internet Explorer. Users can navigate the web either by directly typing the address of a webpage (the URL) inside a browser or by following the links that connect webpages between them.
In this chapter, we review Internet-based methods for crawling and ...
- Loading...