Skip to main content icon/video/no-internet

Online Data, Documentation of

Since the 1990s, researchers have increasingly relied on data publicly available through the Internet. With the growth of social media, opportunities have expanded to collect and document massive quantities of information produced by and about people, things, and their interactions. Many communication studies now include content analyses of blogs, both personal and political, analyses of tweets, and posts on online discussion boards and forums. The advantage of online data is that it is often “naturally” created by Internet users. Online data can uncover users’ attitudes and behaviors more accurately, considering that users are not subject to providing socially desirable answers as they may be when participating in traditional focus groups or surveys. This entry discusses advantages and challenges of online data documentation, as well as social media mining, one of the most popular ways of documenting online data.

Advantages of Online Data Documentation

There are multiple advantages of data collected through online media. First, digital media data meet the criteria for ecological validity as they represent the everyday behavior of users. This allows researchers to have access not only to what people say that they do (e.g., interviews, focus groups, surveys), but also to what they really do. For example, with traditional focus groups and interviews, participants are responding to directed questions. The users of Facebook and Twitter post whatever is on their mind. This represents a great opportunity to learn about people’s motives, needs, emotions, and behaviors. In addition, online data can take many different forms, including photos, artwork, videos, and audio recordings. Questions could be arranged, and often customized, based on the answers provided. Since data are most often collected into one database, the time for analysis is reduced. Traditional methods often required a paper and a pencil, and all data had to be manually input into a database. Second, such behavior can be studied as it occurs, without obtrusive methods that often distort human interaction. Third, in many cases, data already exist, and researchers do not have to collect it. For example, researchers have studied data available on Twitter to learn about a variety of patterns. Fourth, data about groups that are hard to reach or rare and scattered can be collected relatively easy. Overall, online data documentation is more convenient, cost saving, and practical compared with traditional methods.

Challenges of Online Data Documentation

A number of basic steps for the traditional content analysis could be generalized to online data. However, online data are different from the text generated by interviews, focus groups, and diaries. First, social media offer much more data than traditional interviews or focus groups. Therefore, a researcher needs to place limits on his or her search efforts. Because online data are abundant, there is much more “noise,” or excess data, in the text. Therefore, many posts and comments on social network sites will be useless. For example, an analysis of a large number of “tweets” will show what users do, but not why they do it. In addition, people purposefully change who they are when using online identities. Therefore, it is important to understand that online data do not represent attitudes but only traces of behavior. In addition, many people do not have Twitter or Facebook accounts, and therefore, one cannot generalize about the population based on online data only. Some users may have an account but never log in, while others might be there to lurk. Another challenge is availability bias: Researchers use data when data are available, sometimes accepting it at face value. They also make decisions, for example, in the case of social media about what attributes will be counted and which will be ignored. In addition, researchers have the final say in the interpretation of data. This can lead to issues of privacy and ethical standards as users do not always distinguish between private and public messages. Researchers have to be concerned with protecting the privacy of research subjects who are disclosing information online but may be apprehensive about how that information will be used. Just because content is publicly available does not mean that it was meant to be consumed. For example, if a researcher is quoting someone’s blog post but uses a pseudonym for that author, anybody can do an Internet search to determine the author’s actual identity. In addition, online data are often difficult to classify by humans, so different computer programs are used to code the data. Some studies require the use of particular technology and therefore require a researcher to possess advanced knowledge and expertise of that technology. Overall, online data are easily available, hard to classify, and their interpretation depends on the researcher. Taken out of context, online data might be misleading. Just because data are accessible does not mean it is necessarily ethical to use the data for research purposes.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading