This case study uses a study of fan blog commentary to explain how researchers decide precisely what text to gather from the Internet as well as how they identify the locations of that content, decide exactly which website from which to gather data, and how to get the data from the Internet into text files for analysis. If the data set thus gathered is too large for the chosen method of analysis, we offer detailed descriptions of how to employ random sampling to data gathered from multiple websites to ensure representativeness as well as employ random selection in assigning chunks of the sampled data to multiple coders for analysis. Next, we explain how to seamlessly incorporate tests for intercoder reliability into the research design. Finally, we explain one method of analysis, thematic analysis using grounded theory analysis with axial coding.
Selecting, Scraping, and Sampling Big Data Sets from the Internet: Fan Blogs as Exemplar