Skip to main content
Search form
  • 00:00

    [MUSIC PLAYING][An Introduction to the Benefits and Risks of Twitterfor Social Science Research]

  • 00:10

    GRANT BLANK: I'm Grant Blank.[Grant Blank, PhD, Survey Research Fellow,Oxford Internet Institute] I'm a survey researchfellow at the Oxford Internet Instituteat the University of Oxford.I'm a sociologist by background and training.I've been primarily working with digital inequalityin various forms and some of the implications of thatnot only for Britain as a whole, but also for social research.

  • 00:34

    GRANT BLANK [continued]: [What implications does data from Twitter have on socialresearch?]Twitter data has become one of the ubiquitous data sourcesin social sciences.There are well over 1,000 papers thathave been published using data from Twitter.It's easy to collect.

  • 00:55

    GRANT BLANK [continued]: It's plentiful.It's free in some forms.So it's a very attractive idea for solving a data collectionproblem.The question is, who are these people who use Twitter?This is a hard question to answerbecause Twitter data themselves won't tell you very

  • 01:16

    GRANT BLANK [continued]: much about who the people are.What I did was take the data from a representative sampleof British and American internet usersand look at people who use Twitterfrom that representative sample to look at how representativethe Twitter users are.

  • 01:36

    GRANT BLANK [continued]: And the story in both the United States and the United Kingdomis very similar that Twitter users are disproportionatelywealthy.They are disproportionately well-educated.They have high incomes.So they're an unusual subset.They are not representative of the general population.

  • 01:58

    GRANT BLANK [continued]: And there is no easy way to make them representativeof the general population.Twitter use in both countries is relatively small.The number of people-- the proportionof the population using Twitter is relativelysmall in both countries.What this means is that Twitter users are not representative

  • 02:22

    GRANT BLANK [continued]: of any interesting group in the population,except other Twitter users.It explains why, for example, attemptsto predict voting using Twitter data have been uniformlyfailures, and attempts to predict other things thatrequire representative samples have uniformly failed to work

  • 02:45

    GRANT BLANK [continued]: in the case of Twitter data.It's useful to keep in mind here the distinction between usingTwitter data for commercial purposes versus using Twitterdata for academic purposes.If you're a company, the idea that Twitter peopleare wealthy and well-educated may actually not be a bias.

  • 03:06

    GRANT BLANK [continued]: But it may actually be appealing to youif most of your customers are wealthy and well-educated.So for certain corporate purposes,Twitter data are great because they tell youa lot about the people who buy your product.But they're still not going to be representativeof the population as a whole.

  • 03:27

    GRANT BLANK [continued]: And what that means is Twitter dataare probably much more useful for private firms whoare trying to sell you products than theyare for academic researchers.[What are some common problems with using Twitter data?]

  • 03:47

    GRANT BLANK [continued]: Twitter data are part of a class of datawhich have an interesting collection of biases.And I think many of the problems that Twitter datahave are going to be familiar to anyone who'sworked with social media data because manyof the problems with Twitter dataapply to other forms of social media.

  • 04:10

    GRANT BLANK [continued]: The easiest way to see this is to compare the Twitter datawith other ways that social scientists collect data.For example, social science surveysare deliberately designed to be representative of populations.Furthermore, the social scientists doing the survey

  • 04:31

    GRANT BLANK [continued]: has a great deal of control not only over the individualswho answer the survey to make surethat they're representative, but the questionsof the survey and the response options in the survey.So a great deal of effort goes into designing the questionsin such a way that they have a single, stable meaningto everyone in the population so that

  • 04:53

    GRANT BLANK [continued]: the 80-year-old grandmother and the 18-year-old high schooldropout understand the question and the responsesin the same way.This is hard.This takes a lot of work, and it's expensive work.That's why designing surveys is a specialist joband why are there people who spend their entire careers

  • 05:14

    GRANT BLANK [continued]: working in surveys.But it does mean that you have a great deal of controlover the process that generated the data.You have control not only over the peopleand over the questions and the responses.You also have a great deal of control over what questionsyou ask.So you can ask questions on the topics that

  • 05:36

    GRANT BLANK [continued]: are important to you, the topics that your theory hassaid are important to you.You can gather data on gender.You can gather data on race.You can gather data on whatever elseis important in the population you're concerned with.Contrast that to Twitter.The scientist working with Twitter data

  • 05:58

    GRANT BLANK [continued]: has no control over the data generating process.They're forced to accept whatever textthe author wrote down in the tweet,complete with misspellings, mistyped typos,lack of punctuation, odd abbreviations, and so on.

  • 06:19

    GRANT BLANK [continued]: It's very low quality data.And it's very hard to be convincedthat the meaning that you've extracted from that datais consistently the meaning that every single authorwants to be extracted.Now, there are shortcuts, I know, with hashtags and thingslike that help you along towards the meaning.

  • 06:40

    GRANT BLANK [continued]: But ultimately, you're still dependingon the text in many cases.You can't always depend on the shortcuts.This is a generic problem with all kinds of found datathat are not actually generated by a social scientist.The data are collected by someone for their purposes.

  • 07:04

    GRANT BLANK [continued]: They're not collected for scientific purposes.They're not going to have key variables in them.They're not going to have crucial informationthat you want.They're not theory driven.So that means they're inevitably going to be incomplete.It's going to be hard to use them.

  • 07:27

    GRANT BLANK [continued]: Important questions can't be asked.Important answers can't be found in those source of data.So the underlying point, I think,is that the more control a social scientist hasover their data-generating process, the better the data

  • 07:48

    GRANT BLANK [continued]: they have, the higher quality data they have.And it's easier for them to address important problems.[What are some of the issues to be aware of when collectingdata from Twitter?]Twitter data are available in various forms.The easiest form, which is to collect all tweets,

  • 08:12

    GRANT BLANK [continued]: is usually unavailable to social scientistsbecause it requires payment of fairly large amounts of money.Instead, most social scientists relyon the free versions of Twitter data,which are much smaller quantities of data.Now, this has several problems.The biggest is that these tweets are not

  • 08:37

    GRANT BLANK [continued]: a random sample of tweets.So the free tweets are not a random sampleof all of the available tweets.It's not clear how they're selected,but there's several studies that confirm that, in fact, they'renot random.So if you're getting free data, you'regetting what you're paying for here because it's notgoing to be representative even of Twitter data, let

  • 08:60

    GRANT BLANK [continued]: alone anything else.Now, the second problem is a lot of peoplewho use Twitter don't actually tweet very much.And the people who tweet a lot make upthe bulk of all of the tweets.So what you're getting when you sample on the tweetis the people who are noisiest, the people

  • 09:21

    GRANT BLANK [continued]: who spend all of their time tweeting.And this, of course, means you'regetting even a biased sample among Twitter users.You're getting mostly the noisy people.A related problem, of course, is bots.Twitter now claims to be working to eliminate bots on Twitter.

  • 09:47

    GRANT BLANK [continued]: And they probably have done a good job on the obvious ones.But it's clear that you can make a bot tweetlike a person in many ways.You can set up a cycle where it appears that the bot sleeps.You can actually have a set of other-- manyof the other characteristics thatare used to identify robotic tweets can be mimicked.

  • 10:11

    GRANT BLANK [continued]: You can limit their number of tweets per hourto something that a person might plausibly do and so on.So there's still going to be bots somewhere in your tweets.And you won't be able to find them.So the bottom line here is that Twitter dataare hard to use as a representative samplenot just because they're not representative of any--

  • 10:34

    GRANT BLANK [continued]: Twitter users are not representativeof any population, but tweets themselves are notrepresentative of Twitter users.[What are the affordances of Twitter?]It's interesting to look at the affordances of Twitter.Twitter's a very lightweight platform,

  • 10:55

    GRANT BLANK [continued]: which is very easy to use in situationswhere you have a restricted device like a mobile phone.A more complex platform like Facebookand like many other social media platforms whereyou have not only text but also pictures and audio

  • 11:16

    GRANT BLANK [continued]: and things like that that you want to display on the homepage of any individual are much harderto deal with on small screens and much easierto deal with on larger screens like laptops or tablets.So Twitter is used often in environmentswhere you have a small, restricted screen.

  • 11:39

    GRANT BLANK [continued]: It's also used in settings where there'sa lot of breaking news, where it's easy to use your phone,but you might not easily be able to find a laptop or a tabletto use.So Twitter is often used in emergency situationsor situations where there's breaking news of some kind.

  • 12:00

    GRANT BLANK [continued]: Often, that comes quickly on Twitterbefore it's available on other media.But all that simply says is the affordances of Twitter pushpeople to use Twitter for certain topicsand not necessarily for other topics,which is where other social media come in.

  • 12:22

    GRANT BLANK [continued]: So this is, again, another situationwhere Twitter itself is an exceedinglybiased representation of the social world.And you cannot treat tweets as a representative sampleof the social environment.

  • 12:43

    GRANT BLANK [continued]: [What contextual considerations are there for researchersusing Twitterdata?]One of the interesting characteristics of Twitter datais if you look at the campaigns on Twitter,the political campaigns on Twitter like Brexit,the campaigners there will tell you--

  • 13:04

    GRANT BLANK [continued]: and there's research on this point--that their Twitter campaign was directed towards politiciansand news media people.It was directed toward elites.The Twitter campaigns were not designedto influence ordinary voters.They were designed to influence elites.

  • 13:24

    GRANT BLANK [continued]: So that suggests even the people whoare using Twitter purposefully understandTwitter as a mechanism that is among communicationsamong elites.It's a way in which elite influence is transmitted.So if that's what you're studying, that's great.

  • 13:44

    GRANT BLANK [continued]: That's what you should be workingwith is you need to work with Twitter data at some form.But one of the problems that Twitter has isrepresentativeness even among elites.So you can't only use Twitter data alone.You're going to need to combine itwith things like interview data or other media, data

  • 14:06

    GRANT BLANK [continued]: from other media, both online media and offline media,to understand the media landscape.One of the characteristics that people thoughtabout when the internet first became prominent in the 1990swas this is a wonderful new world where all sorts of media

  • 14:28

    GRANT BLANK [continued]: are going to be available.People are going to have a lot more accessto all kinds of news and informationthan they had before.And, in fact, if you look at the surveys, people have--on average in Britain and the US--they daily consume information, political information,

  • 14:50

    GRANT BLANK [continued]: from between four and five different media sources,counting both online and offline media sources,about half online and half offline right now.So if we believe that, if we believethe idea that the internet has opened up media,then what we need to do is actuallystudy all of these media simultaneously.

  • 15:12

    GRANT BLANK [continued]: We need to think about the entire mediaenvironment in the media ecosystem that includes notjust social media, but also online medialike political websites and commercial websitesand online websites of offline publicationsand then the offline media like television, newspapers,

  • 15:35

    GRANT BLANK [continued]: magazines, radio, and so on.This entire media landscape is what real people confront.They don't just confront Twitter data.They don't just confront social media.And so the time of single media studies, single medium studies,

  • 15:56

    GRANT BLANK [continued]: is over.Unless they can be theoretically justified,we need to stop doing studies of single media.We need to be studying the entire media landscapeand trying to understand how people are influencedacross that entire landscape.

  • 16:19

    GRANT BLANK [continued]: [MUSIC PLAYING]


Grant Blank, PhD, survey research fellow, Oxford Internet Institute at the University of Oxford, discusses the benefits and risks of Twitter data for social science research, including implications of using Twitter data, common problems with its use, issues to be aware of in its collection, affordances of Twitter, and contextual considerations for researchers using Twitter data.

Looks like you do not have access to this content.

An Introduction to the Benefits and Risks of Twitter for Social Science Research

Grant Blank, PhD, survey research fellow, Oxford Internet Institute at the University of Oxford, discusses the benefits and risks of Twitter data for social science research, including implications of using Twitter data, common problems with its use, issues to be aware of in its collection, affordances of Twitter, and contextual considerations for researchers using Twitter data.

Copy and paste the following HTML into your website