We conducted a content analysis of 63 Facebook accounts in 2007 to find out how, to whom, and what identity claims were made by college students on the institutionally anchored non-anonymous social networking site. We found that the identities produced in this social environment differ from those constructed in anonymous online settings. Facebook users tend to claim their identities implicitly rather than explicitly; they ‘show’ rather than ‘tell’ and stress group identities over personal ones. In this case study, I unpack the methodology of content analysis and describe in detail the major steps we took in applying this method to the study of Facebook pages. I also reflect and comment on the methodological issues we encountered in our study and offer some advice for those who are interested in using this method.
- To understand the aim and basic procedures of content analysis
- To understand some of the methodological issues related to content analysis
- To see an example of how content analysis was used in the study of Facebook pages
- To think about how content analysis can be used to study other social phenomena
In the spring of 2007, we conducted a content analysis of 63 Facebook accounts to find out how, to whom, and what identity claims were made by college students on the institutionally anchored non-anonymous social networking site. Our findings were published in the subsequent year in the journal Computers in Human Behavior (Zhao, Grasmuck, & Martin, 2008), and in 2012, our article was ranked as a top cited article among all those published in that journal between 2007 and 2011.
Content analysis is a method of both data collection and analysis commonly used in social research. According to Robert P. Weber (1990), content analysis is a research technique employed to make inferences about the sender of the message, the message itself, or the audience of the message based on the analysis of the text. Two important points are being made in this succinct definition. First, texts and messages are not the same thing: texts are vehicles that carry and express messages, and messages need to be inferred from the text. Second, content analysis of a text can help reveal not only the embedded messages but also certain attributes about the sender and the intended audience. In practice, the method of content analysis has also been used to analyze other forms of message carriers, such as photos, music, TV, and radio shows.
This case study is a methodological examination of the content analysis of Facebook pages we conducted in 2007. Facebook is an emergent interactive medium of online communication that contains a combination of text, voice, photos, and music. Our success with the study of the 63 Facebook accounts demonstrated the strengths and broad applicability of the method of content analysis, but we also encountered methodological challenges in the study of the new medium. In this case, I go over and reflect on the steps we took in our analysis of the Facebook accounts. To begin, I describe in detail the processes of the content analysis we conducted, and finally, I offer some comments and advice on the use of content analysis for the beginners.
Research methods of empirical studies are typically divided into two types: data collection and data analysis. Data collection methods are techniques that researchers use to gather information from the empirical world, and examples of such techniques are telephone surveys and face-to-face interviews; data analysis methods, on the other hand, are techniques that researchers use to analyze the gathered information, either quantitatively or qualitatively. Content analysis is a research method that includes both data collection and data analysis. Specifically, the process of content analysis comprises three main phases: problem formulation, data attainment, and message extraction. I will go over these three phases in the content analysis of Facebook pages we conducted in 2007.
Scholarly studies are carried out to solve research problems. Research problems are issues and arguments that emerge in the pursuit of scientific knowledge. However, not all problems are worth researching as some can be trivial and insignificant, and not all worthy problems are always researchable due to the lack of resource and technologies at the given time. Identification of a proper problem is therefore an important first step in research.
A good place to start in search for a research problem is the existing literature on the topic of interest. In our case, we were interested in the way people present their selves to others, particularly on the Internet. There was a large body of literature on this topic. Researchers had found that people tend to engage in ‘dramaturgical performances’ (Goffman, 1959) to present a favorable impression of themselves on others in face-to-face interaction. However, their performances are invariably constrained in that setting by their corporeal presence and other visible personal characteristics associated with the body. In comparison, the ‘true self’ of an individual comes out of the closet when the interaction takes place in an anonymous and disembodied environment, such as online chat room. Under the protection of anonymity, individuals tend to play with their identities freely and pretend to be whoever they want to be.
The question we raised was as follows: What kind of selves would people present to others in an environment that is not face-to-face but not anonymous either? Being non-face-to-face with others, self-presentation will be unconstrained by the individual's actual corporeal characteristics, but being non-anonymous, the self-presentation may be affected by the personal characteristics that are known or can be potentially known to others. We were interested in finding out how and what type of selves individuals tend to present in such a social setting. There had not been much research on this issue. Based on the sociological theory that human behavior is influenced by the characteristics of the social environment, we hypothesized that the selves presented in a disembodied but non-anonymous environment would fall in the category of what had been called ‘hoped-for-possible-selves’. To empirically test our hypothesis, we chose to examine self-presentations on institutionally anchored campus Facebook, an environment that closely resembles the settings we specified.
Data for content analysis usually comes from archival records—documents that are already gathered and archived by others. Examples of such archival records include government documents, meeting minutes, newspaper articles, films, diaries, and letters. Some of these kinds of records are available to the public, and others are kept private. Facebook pages, like other online social media, are public displays of private records, and they pose a tricky question as to how to legitimately obtain such information for scientific research. Being a Facebook friend to someone else, one is able to access a lot of private information about others, but does that ‘friendship’ status give a researcher the green light for gathering data?
Another issue to consider in data gathering is the representativeness of the sample. A sample is a portion of data from the larger world a researcher is interested in, and it is through the study of this portion of data that the researcher seeks to gain knowledge of the larger world. The question is, how can we be sure that the sample we look at is actually representative of the larger world we are interested in so that the findings based on the sample can be generalized to the world out there? In general, there are two types of sampling methods that produce a representative sample: random sampling and purposive sampling. Random sampling is a procedure that gives each element in the sampling universe an equal chance of being selected, and purposive sampling, on the other hand, is a procedure that selects elements of only certain types according to the specified research needs. While random samples seek to represent the entirety of a universe, purposive samples seek to represent certain dimensions of a universe that are of interest to the researcher.
The data for our Facebook analysis came from a larger study that used the method of purposive sampling. The larger study was conducted at a Northeastern university in the United States to examine issues related to ethnic identity, friendship, and sexuality among college students. For the purpose of the study, only students from the following four minority groups were to be included in the sample: African Americans, Vietnamese Americans, Indian Americans, and Latino-Caribbean. To ensure the representation of distinct social clusters on campus, within-group sampling also considered the following dimensions: male/female, sorority/fraternity, majors/colleges, among others. The resulting sample consisted of 63 students who were interviewed face to face and asked permission for access to their Facebook accounts. For comparative purposes, we supplemented this initial sample with 20 White students who agreed to participate in our Facebook study, and they were randomly selected from among those who had responded to a National Student Survey on campus. This yielded an enlarged sample of 83 students for our Facebook analysis.
In March 2007, we began to download the Facebook pages of the 83 students. However, 11 students either did not have Facebook accounts or blocked us from accessing their accounts, 7 students allowed us to view only their profile cover pictures, and 2 other students had only group accounts. As a result, we ended with a total of 63 analyzable Facebook accounts for our study. The final sample included 15 White students, 12 Black students, 14 Latino students, 13 Indian students, and 9 Vietnamese students, with a balanced ratio of gender and other attributes of interest.
With data in hand, our next task was to extract relevant information from the 63 downloaded Facebook pages. A Facebook account consists of a rich array of multimedia information—texts, photos, music, and videos—organized into multiple domains of self-presentation: personal profile, ‘about me’ description, albums, list of friends, groups joined, interests and hobbies, favorite music/TV/movies/books/quotes, and wall posts. How to analyze this dazzling amount of information? What counts as relevant information? In other words, how to go underneath the surface of data and reveal the true messages that were being expressed?
At this point, it is necessary to introduce a subtle distinction made by Goffman (1959) between expressions given and expressions given off. ‘Expressions given’ refers to the meanings commonly associated with the symbols—verbal or nonverbal—that are used to express them, and ‘expressions given off’, on the other hand, refers to the meanings that are indirectly expressed by the symbols used in the communication. For example, the expression given by the statement, ‘I had lunch with the mayor yesterday’, is exactly what the statement says, and the expression given off by this statement may include, ‘I'm well-connected’. Expressions given off are, therefore, more contextual, theatrical, and nuanced. According to Goffman, self-presentations are performed mostly through the exchange of expressions given off rather than expressions given. This subtle but important distinction poses some challenges to the extraction of messages from Facebook pages which are imbued with expressions given off.
Our analysis of Facebook pages involved three major steps: first, the construction of a codebook specifying the rules for classifying information; second, the use of the codebook to gather relevant information from downloaded pages; and third, the drawing of inferences based on the gathered information about the expressions given off by the individuals. I will discuss each step in turn.
A codebook is a guide for gathering relevant information. It consists of explicit rules for selecting data, and for grouping, naming, and numbering the gathered data. The essence of codebook construction is the creation of categories for organizing data. Categories are concepts used to classify and contain relevant information, so they tell the researcher what information to look for in the vast ocean of the raw data. Category construction needs to be theory-driven, geared toward solving the formulated research problems. Specifically, there are two different levels of categories to be created: object-level and attribute-level. Object-level categories are concepts that divide phenomena into distinct classes, and they are analogous to questions in survey instruments and are called ‘variables’ in quantitative analysis. Attribute-level categories, on the other hand, are subdimensions of object-level categories, and they are analogous to response options in survey instruments and are called ‘values’ in quantitative analysis. For example, ‘gender’ and ‘race’ are object-level categories, and ‘male/female’, ‘White/minorities’ are attribute-level categories. A basic rule for constructing subcategories is that the dimensions must be mutually exclusive, that is, a given attribute can only be classified into one subcategory; moreover, the provided subcategorizations must be exhaustive, namely, there should be a distinct category for every attribute of interest. For ease of quantitative analysis, each subcategory is often assigned a numeric value, such as ‘1 = male’ and ‘2 = female’. However, sometimes the data are rather complicated, and it is difficult to construct exhaustive and mutually exclusive subcategories to group it, in which case the data are recorded verbatim or as is.
Most of the object-level and attribute-level categories we included in our codebook came from the items on Facebook pages (an abbreviated version of our codebook can be found in the appendix to our 2008 article). However, we excluded a lot of Facebook items either because they were not central to our research problems or it would be too difficult or time-consuming to analyze them. For example, we decided not to systematically go through the content of albums and wall posts as that would take a tremendous amount of time. As a matter of fact, for the same reason, we decided to focus in our analysis only on the quantity of items users provided under the selected categories. For example, with regard to the favorite music, TV, movies, books, and quotes a user listed on his or her Facebook account, we would record only the number of the listed items, not the actual content of these items.
Of course, we did decide to look into the content of certain Facebook items. Facebook allows users to provide a profile cover picture which others will first see when visiting their accounts. Users can upload whatever photo they like to generate a desired ‘first impression’ on their visitors, but they also have the option of not providing any photo at all. This is an important moment for self-presentation. We created a ‘profile cover picture’ category in our codebook and included the following subcategories for coding the content of the cover pictures: ‘0 = blank’, ‘1 = self’, ‘2 = with others’ and ‘3 = avatar’. These attribute-level categories were created to capture possible variations in the presentation of a ‘first impression’. In addition to a profile cover picture, Facebook allows users to provide an explicit ‘about me’ narrative to verbally introduce themselves to the visitors. This is another important moment for self-presentation. In this case, however, we decided to focus on the length, rather than the content, of the narratives provided by the users. We included in the codebook the following subcategories for coding the ‘about me’ narratives: ‘0 = missing’, ‘1 = one or two short sentences’, ‘2 = one or two short paragraphs’ and ‘3 = long paragraphs’.
Decisions on category construction in codebook creation have profound and often irreversible consequences for message extraction at a later stage. Codebook categories are the ‘fishing nets’ of data attainment, and they determine how much and what type of information to be obtained for analysis, so they should be constructed carefully. It is important to keep in mind that more information is not always better—the key is to strike a proper balance between the relevance of the information and the resources available for attaining such information.
Coding refers to the process by which the raw data are classified into meaningful categories based on the codebook. The classification is usually recorded on a coding spreadsheet. Each observation is placed in a row, each category in a column, and each subcategory in a cell—the intersection of a row and a column. The coder goes through each case and enters the content (numeric or narrative) of a given subcategory in the proper cell on the coding sheet until all the categories in the codebook are covered for all the cases. However, this coding process is often not as straightforward as it may seem to be. No matter how explicit the coding rules are, there are always cases that do not exactly fit the specifications, in which situation the coder needs to make ‘on the spot’ coding decisions. While these kinds of decisions are unavoidable, they can lead to coding consistencies and cause error.
In our study, coding was done by a specially trained graduate research assistant, who met regularly with the two faculty members on the team to resolve rising coding issues. All aberrant cases were discussed by the three-member research team, and any ad hoc coding rules made were recorded as amendments to the codebook to be followed in subsequent coding.
Analysis is the process of extracting messages from the obtained data. As has been pointed out earlier, data and messages are not the same thing. In the study of human interaction, data consist of all forms of sensory expressions performed by the actors, and messages are what the sensory expressions were meant to convey. Depending on the context in which the interaction takes place, sometimes messages are explicitly conveyed by the expressions given, and other times, the messages are implicitly or indirectly conveyed by the overt expressions. A major challenge in content analysis of human expressions is to discern the expressions given off from the expressions given.
The extraction of messages begins with the analysis of the expressions given. With a sufficiently large sample of subjects, the goal of the researcher is to look for patterns of expressions that convey specific meanings within identifiable contexts. Two types of analysis are commonly performed at this stage: qualitative and quantitative. Qualitative analysis involves the examination of the content of expressions (verbal or nonverbal) to discern clusters of meanings, and quantitative analysis involves the tabulation of certain attributes of expressions to discern patterns of behavior. In our study, we focused on quantitative analysis as most of our variables had to do with the presence or absence of certain attributes. For example, we conducted a frequency analysis of all variables we had collected and calculated the mean values for all continuous variables. We also cross-tabulated a number of variables to see how certain categories were correlated with one another. After rounds of such analyses, we had a good sense of the general patterns of self-presentations on campus Facebook.
Our next step was to go from expressions given to expressions given off. However, there was no sure passage between the two, and no analysis software we could use for this endeavor, for this would involve making inferences, which was a conceptual leap based on both intuition and theory. In a sense, this is true of all types of scientific analyses, whose goal is to penetrate the surface of data to reveal its underlying structures, and this penetration requires a conceptual jump over the phenomena of data. Let me illustrate this ‘jump’ with some examples from our study.
A question we sought to answer with our study was how individuals would present their selves to others in a non-anonymous online environment. We could have directly asked Facebook users this question and let them tell us the answer. However, not all Facebook users are necessarily able to articulate the ways in which they construct their identities, and those who are able to may not always tell us the truth. Our statistical analyses of the data from the 63 Facebook accounts revealed some intriguing patterns. For example, while only 7.9% of users wrote long paragraphs to introduce themselves in the ‘About Me’ section, between 65.1% and 73% of users named on average 8.3 pieces of their favorite music and 8.1 favorite movies, and listed on average 4.3 favorite quotes and 4.9 personal interests; furthermore, over 90% of users shared their personal albums, displaying an average of 88.4 photos. This striking variation in the way users presented themselves on Facebook indicated to us that people tend to engage in what we came to call a ‘showing without telling’ mode of self-presentation in non-anonymous online environments. As can be seen, this finding was an inference we drew based on the observed patterns, and a conceptual leap we made from the expressions given to the expressions given off.
Another question we sought to answer through our study was what types of identity claims individuals would tend to make in the non-anonymous online environment. Perhaps in no better place than the profile cover pictures users provided on Facebook can we find an answer to this question. As mentioned earlier, the profile cover picture is a gateway to a user's Facebook account, a place where the ‘first impression’ of an individual is produced. Therefore, the picture an individual chooses to be his or her profile cover picture indicates the kind of impression the individual wants others to have of him or her. Much to our surprise, only 42.9% of the 63 users displayed a photo of just themselves, 38.1% showed a group photo, 14.2% a picture of an avatar, and 4.8% no photo at all. The fact that the majority of the users avoided displaying a single-person picture and chose instead to show no faces at all or to show their faces along with the faces of others in their profile cover picture was very revealing, indicating, among other things, an effort to construct a group-oriented identity. Here again, the indication was an inference we made based on the observed patterns which also existed in other areas of the Facebook accounts we examined, including the size of on-campus friends that users claimed to have (mean = 150.2) and the number of groups that users said they had joined (mean = 24.9).
Besides this impression of ‘being popular among friends’, we discerned two other impressions given off by the users in their identity construction on Facebook: being well-rounded and being thoughtful. It should be stressed that none of the 63 users ever explicitly claimed on Facebook that they were popular, well-rounded, and thoughtful; however, based on our analyses of the expressions they gave, we were convinced that these were the impressions they intended to give off.
To conclude, through content analysis of 63 campus Facebook accounts, we found the answers to our original research questions: in a non-anonymous online environment, people tend to engage in a ‘show without tell’ mode of expression, presenting to others a ‘hoped-for-possible-self’ which is socially desirable, better than the individual's ‘actual self’, but is not entirely fictional.
There are many different types of content analysis. The one we used in analyzing Facebook pages focuses on quantifiable attributes, but content analysis can also be used to identify themes and arguments in texts which are qualitative in nature. Specially designed computer software has been developed to aid in the analysis of texts. However, all content analyses share the same fundamental goal: to go underneath the surface of data—textual or otherwise—to find the underlying patterns and the associated meanings.
Facebook pages represent a new form of data. It is somewhere in between archived records and live human interactions. Like archived records, Facebook pages can be studied unobtrusively, that is, without the knowledge of the actors being observed; however, like live human interactions, Facebook pages are constantly revised and updated. To ‘freeze’ them in time, it is necessary to download the Facebook pages before analyzing them. It is also important to download all the pages during the same period of time so that they are temporally comparable. The other point is that Facebook pages are both private and public: they are private records displayed in public—among a circle of friends. Thus, from an ethical standpoint, it is questionable to gather data for research from others' Facebook accounts without informed consent.
In our study, message extraction began with the construction of a codebook. The codebook laid out the rules for classifying data, and these rules were followed in coding. In other studies, coding began without a codebook. This is especially true in coding ‘thick’ textual data, for which predetermined coding rules are not very useful. Such coding is called ‘open coding’ (Strauss, 1987), where the coder goes through the materials back and forth to identify parameters of interest, and to develop provisional categories for classifying themes and arguments. This process continues until the emerging categories become conceptually saturated. In this case, coding is a form of analysis, a way of breaking the data apart analytically to reveal the embedded messages and decode the expressions given off.
Objectivity is a thorny issue in content analysis. Broadly speaking, objectivity means that the same results can be obtained from the same data by different researchers following the same procedures. Objectivity can more or less be secured in dealing with data at the level of ‘expressions given’ by making coding rules explicit so that they can be consistently followed by different coders. A common measure of coding consistency is the inter-coder reliability coefficient which is calculated by examining the correlation between the results of coding the same data by different coders. However, inter-coder reliability is difficult to obtain in open coding where there is no preestablished coding rule to follow; moreover, researchers with different levels of intuition and theoretical backgrounds may uncover different messages from the same data source. Here, the question is how we can know for sure whose finding is correct, but this is in fact an issue of validity rather than objectivity, and the examination of this issue is beyond the scope of this case study.
- What are the conditions under which content analysis can be used? Discuss the potential ethical issues in studying Facebook pages without informed consent.
- Propose a research problem related to Facebook, describe the kind of data needed to study the problem, and specify a procedure for obtaining such data.
- Take a look at the first 50 wall posts in your Facebook account and develop some categories to code these wall posts. Discuss the coding difficulties you encountered and how you managed to resolve them.
- Take a look at the first 50 pictures in your Facebook album and develop some categories to code these pictures. Discuss the coding difficulties you encountered and how you managed to resolve them.
- Discuss the issue of objectivity and reliability in your coding of the wall posts and albums.