Skip to main content
SAGE
Search form
  • 00:05

    [Digital Social Research, Centre for the Analysis of SocialMedia (CASM)]

  • 00:11

    CARL MILLER: I'm Carl Miller.And I'm the research director at the Centrefor the Analysis of Social Media at the think tank DEMOS.CASM, the Centre for the Analysis of Social Mediais about four and a half years old now.It was created as the first think tank unit dedicatedto studying the digital world.Four and a half years ago, we knewthat something really important was happening.Increasingly, people were transferring more and more

  • 00:33

    CARL MILLER [continued]: and more of their moral, political, social, intellectuallives online.This was a place which was changing what people thought.It was changing who people thought they were.It was changing who people knew.And it was changing how they went out into societyto try and solve problems.It was changing work.It was changing how people found fell in love.All of that we thought was absolutely vital.It was vital to understand how the digital world,

  • 00:54

    CARL MILLER [continued]: and especially social media, was both changing society.But also, we saw there was a massive research opportunity.It was an amazing new window into society.This transfer in the digital worldwas essentially the datafication of social life.It was a transfer of all these thingswhich, over centuries, have normallybeen lost to history into formats

  • 01:15

    CARL MILLER [continued]: which were inherently amenable to collection and analysis.So we saw this basically to be the most exciting researchchallenge for a generation.The data sets which we could work on to understand society,to understand people, were larger than ever before.They're more real time.They're more linked.So CASM was set up essentially to createnew methods, new ways of trying to study society and then

  • 01:35

    CARL MILLER [continued]: use them to try and answer important, urgent, pressingsocial and political problems.Society is kind of creating an instant history of itselfall the time.Rather than going back and spending weeks, if not months,doing research, we now have the very real possibilityof doing real time research, to be able to actually understandphenomena as they are unfolding and beable to inform interventions into those very phenomena

  • 01:57

    CARL MILLER [continued]: themselves.Due to that very size and complexity,it has exploded conventional methodologiesfor understanding the social world.You can't use a poll or a survey in orderto understand the million tweets or Facebook posts.So fundamentally, across the whole social media researchlandscape, a really important transition is happening.

  • 02:17

    CARL MILLER [continued]: More and more, people are pullingin these new quite obscure technologies from big dataanalytics into social science.These new technologies are capable of handlingsocial media datasets, which are very large and verycomplex and linked.But on the other hand, they are very unfamiliar and newand often untested or even not understoodwithin social science.

  • 02:38

    CARL MILLER [continued]: And their importation into social sciencesis really challenging kind of conventional social sciencemethod-- how you use them, how you make sure they're accurate,how you create replicable results,and really how you best express all the standardsand evidentiary principles of social science,which have animated that field for over 100 years-- you know,

  • 03:01

    CARL MILLER [continued]: issues like sampling bias, issues like research validity,research representivity.All these issues now, we're having to kind of recaptureand re-understand how we can get them with the use of allthese new technologies.Sometimes, we have to understand datasetswhich are far too large for us to ever manually read

  • 03:21

    CARL MILLER [continued]: ourselves.And the main technology which we use in order to do thatis called natural language processing.It kind of began in obscurity in the 1970sas a form of mathematical linguistics,a way of turning language into mathematical expressions.But since we've had this challengeof trying to understand social media datasets whichare too large to manually read, natural language processing

  • 03:43

    CARL MILLER [continued]: has really exploded.It's become the primary way in which we can teach computersto automatically try to understandthe meaning of natural language--that is, people just talking to other people.That's the fundamental way in which we try and understanddatasets which are too large to manually read.And we've built a platform here at CASMat DEMOS University of Sussex called Method 52, in order

  • 04:04

    CARL MILLER [continued]: to do that.Welcome to Method 52.It actually stands for methodologically soundsociological research on Twitter.This is the major platform, major toolthat we have to allow non-technical peoplelike me to be able to do big data analysis,primarily on social media datasets.Method 52 allows us to both collect, scoop up,

  • 04:24

    CARL MILLER [continued]: and manage the river of data which social media throws outat us.So we can see here that on each of these different screens,I've built a bespoke analytical pipeline.There are components running down the side here,everything from annotators to classifiers to consumersof data, each one doing a different and often important

  • 04:45

    CARL MILLER [continued]: task to allow me to kind of marshaland manage this deluge of data, and eventuallyturn it into all the things that you saw,the patterns and the insights and the realizations of whatall of this actually means.We needed to find a way of allowing non-technical people,subject matter experts, analysts, social scientists,people that wanted to study a particular group

  • 05:07

    CARL MILLER [continued]: and knew about that group, knew about that phenomenon,to be able to do it.And Method 52 was the technology we created in orderto allow that to happen.It allows non-technical analysts to essentially builddata pipelines with lots of different kindsof interventions-- components, wecall them-- which can help them try and break apart, and split,

  • 05:29

    CARL MILLER [continued]: and organize data in ways which actually make sense to them.And the major way in which we do this, the kind of,the business end of Method 52, is the classifier.This is a natural language processing algorithmwhich non-technical people can trainwhich allows them to break apart and understandvery large datasets.

  • 05:50

    CARL MILLER [continued]: The best way of thinking of the classifieris like a sluice gate.Got this big river of data rushing at you.You've got a deluge of tweets, a deluge of posts.And this algorithm is what you build,this model that you train, in orderto turn that big, gushing river into a seriesof more manageable rivulets.You train this algorithm.You feed it data.You give it examples.And it begins to learn the words and the phrases

  • 06:12

    CARL MILLER [continued]: and the kind of language which correlate with one categoryand with another.So you can see here, there's a big list,a randomly sampled list at this point, of different tweetsfrom that major dataset, 1,500 in general.And as I've logged into the system,I am annotating each tweet.I am telling the algorithm underlying

  • 06:32

    CARL MILLER [continued]: all of this technology what a boo or whata cheer sounds like.As we do more and more of this-- wecall it machine learning-- the algorithm getsbetter and better at understanding whichtweet goes into which now.After we are confident that our algorithm is working properly,we then push all the tweets through that classifier

  • 06:55

    CARL MILLER [continued]: and then begin to present it to usin ways which we can use to spot patterns in orderto turn big data datasets into new kinds of meaning and newskinds of insights into digital politics.During the 2015 general election,we saw the emergence of a new and very important

  • 07:17

    CARL MILLER [continued]: political battleground.Over 80% of MPs were sometimes reluctantly beingdragged onto Twitter and a majority on Facebook as well.They knew this was a new territorythat they had to dominate.They knew this was a new direct linkbetween them and constituents.So we knew also, of course, that we,as people who want to understand society and politics,had to watch and study this new world.

  • 07:40

    CARL MILLER [continued]: So we were desperately doing what we could to followthe campaigns, to collect as many tweaks as possible,to look at what's happening on Facebook,to talk to as many people we could within the campaignsin order to understand how the digital world fittedinto the kind of campaigning, I suppose, in the 21st century--what it meant for the politicians,

  • 08:01

    CARL MILLER [continued]: how it influenced people, how it affected people,and how it didn't.We were sucking in over 10 million tweetsduring the course of the campaign.We were sucking in every tweet from every MPand most PPCs-- that's Prospective ParliamentaryCandidates, every tweet about every MP and most PPCs,and every tweet to them.And in doing so, we were really tryingto understand what this new landscape meant,

  • 08:23

    CARL MILLER [continued]: how it was changing how politicians communicatedwith people, and also, and probably more importantly,how it was allowing people to speak back.Of course, social media platforms are two-way channels.And we saw hundreds and hundreds and hundreds of thousandsof people take to this new kind of rollingpublic digital commons to kind of join

  • 08:43

    CARL MILLER [continued]: the debate about the future of the UK.

  • 08:45

    ALEX KRASODOMSKI JONES: Interrogating dataon this scale requires a very specific set of skills.Frequently, social scientists aren't, haven't learnedand haven't been equipped with the skillsto be able to interrogate computer-- well,the data that we're talking about here.There's enormous amount of-- in the Sunday Times election set,for example, we were looking at about seven

  • 09:07

    ALEX KRASODOMSKI JONES [continued]: and a half million tweets over the period of two months.Now, that overwhelms traditional social science methods.It's impossible to read that number of tweets.And so this process has to be automated.We'll look briefly at the datasetthat we collected in a run up to the general election, whichis about seven and a half million tweets.I'll throw a few components onto the screen.

  • 09:29

    ALEX KRASODOMSKI JONES [continued]: And we can look at the different types of datathat we had available to us.And this is a really important partof where the work we do at CASM is quite unique.And it's not just social media data that we are interested in.Where social media data is perhaps at its most interestingis where it intersects with other offline forms of data.

  • 09:50

    ALEX KRASODOMSKI JONES [continued]: And the general election is a really easy wayto understand that.

  • 09:53

    CARL MILLER: Social science has had oftentypically suffered from a lag.It takes days, weeks, even monthsto put together a research intervention,to design the questionnaire, to get it out into fields, to getthe results back, to analyze them and interpretthem and produce an end result. Now, thanks to social media,

  • 10:15

    CARL MILLER [continued]: a kind of new prospect is openingof actually being able to understand society, includingpolitics, as it is unfolding.So as soon as that tweet is sent, it's on Twitter.As soon as that tweet is on Twitter, it's in our system.As soon as it's in our system, it'sbeing algorithmically analyzed.As soon as it's being algorithmically analyzed,it's on a dashboard and people can see it.And that allows us to produce things like a real time

  • 10:36

    CARL MILLER [continued]: analysis of all the boos and cheers and jeersand digital catcalls of the arena of Twitterduring a debate as it happens.So we produced a live analysis, a new window into Twitterfor a newspaper, which hundreds of thousands of peopleturned to to watch as they were watching the debate.

  • 10:58

    CARL MILLER [continued]: So at the same time as watching all the seven partyleaders stand there in front of the glare of the televisioncameras, they're also hearing whatthese hundreds of thousands of digital jeers and booswere saying at the same time.

  • 11:13

    ALEX KRASODOMSKI-JONES: The first thingwe might want to know is how manytweets that we're dealing with in this dataset.And so we begin by creating just statistical windowsinto the data.And this allows a researcher to very quickly be

  • 11:34

    ALEX KRASODOMSKI-JONES [continued]: able to reference a single point on the dashboardor ask and answer questions about it.So we are looking at about seven and a half million tweets.We might also be interested in how many userswe are looking at, and so how manyof the seven and a half million tweets,

  • 11:54

    ALEX KRASODOMSKI-JONES [continued]: how many users sent those tweets?It's about 3/4 of a million.And that, the nature of a dashboardmeans that once we have decided that thatis a valuable question to ask, that information will alwaysbe available to us.So one of the things that we might be interested in

  • 12:16

    ALEX KRASODOMSKI-JONES [continued]: is how many-- what hashtags were being used during the set.And hopefully, that will give us some indicationof the kinds of perhaps issues that are being raised,all the parties that have created hashtags.And those hashtags have been successful,in terms of party political campaigning on social media.And so we already have some idea of which kind of partiesare getting a lot of coverage on social media

  • 12:38

    ALEX KRASODOMSKI-JONES [continued]: and what offline events and what other media events are drivingconversation on Twitter.One of the tables that we've just built here,one of the charts, should I say, looks at,by party, who is being talked about.And so we can assign a party to a user screen name on Twitter.Say, for example, David_Cameron would

  • 12:59

    ALEX KRASODOMSKI-JONES [continued]: be a Conservative account.And then just work out the percentages.And Labor come in first, as you cansee here, with around 40% of the total tweet volume.And then the conservatives, somewhere behind on 27%.

  • 13:13

    CARL MILLER: We weren't just listening to politicians.We weren't just listening to whatwas being said in the lobbies of the Palace of Westminster.We were able to pretty immediately understandhow hundreds of thousands of normal people across the UKactually reacted to political messages.We were trying to understand whatthey thought were important.We were trying to understand the issues whichthey thought were priorities.

  • 13:33

    CARL MILLER [continued]: We were trying to understand the moment during the debateswhere not the professional political commentators,but the many hundreds of thousandsof amateur digital commentators, the moments which they thoughtwere key.Now, I think that's important.Because I think that begins to actually democratizehow we understand politics.I think suddenly we don't just have

  • 13:54

    CARL MILLER [continued]: to listen to a very small, select groupof professional watchers and commentatorsto actually begin to interpret events.We can actually turn this window into somethingwhich we can present people, saying,this is what you thought.This is what all of you-- anyway,the unrepresentative slice of the digital you--thought about this event or this message.

  • 14:15

    CARL MILLER [continued]: And I think that's really important.But of course, it comes with it many challenges.And one you just heard in my responsewas the unrepresentivity.So it's very important, of course--and this goes for all research-- when we use it for journalism,and or when we use it for social science more academically,is that it is robust and trustworthy.

  • 14:36

    CARL MILLER [continued]: And there are many different challengeswithin doing social media researcharound both the robustness, the representivity,the validity of that research.We're still trying to make sure, we'restill trying to work out how we can link this new, quiteunfamiliar form of doing researchin with the principles of social science.And beyond that, we're still tryingto work out how we can actually present the outcomes of it

  • 14:57

    CARL MILLER [continued]: in ways which are clear, and intuitive, and immediatelyrecognizable to people.So these are, as we call, front end platforms,which take all that data and present them in ways in graphs,in pie charts, in line charts, in colors, and everything elsewhich makes sense to people, where you can actuallysee the patterns.

  • 15:17

    ALEX KRASODOMSKI-JONES: This was the dashboardthat we built internally to help us understandthe questions that we wanted to answerabout the general election.One of the things we were interested in quite late on,is you can see, was how different politicianswere behaving on Twitter.And so we would create a window in our dashboard

  • 15:41

    ALEX KRASODOMSKI-JONES [continued]: that tried to answer that question.So for example, was a tweet a broadcast tweet, something justsort of shouted it out into the ether?Was it a re-tweet of somebody else?Or was a reply and direct reply?Each week, we tried to create a new window into the datato provide some new insight into social media and the way it

  • 16:03

    ALEX KRASODOMSKI-JONES [continued]: was being used by politicians.So for example, we could look at an MP analysisand look at the number of tweets by party.We could look at the average followership,which is displayed on the left.And obviously, you can see George Galloway hasa massive Twitter followership, and so his party is highest.I think one of the most important things to remember

  • 16:23

    ALEX KRASODOMSKI-JONES [continued]: is the role of the social scientist is all this.Although a lot of the analysis can be automatedand needs to be automated, asking the right questionsrequires a skilled and dedicated social scientistsitting behind the keyboard.

  • 16:40

    CARL MILLER: So there's a new and growing toolboxof technologies and platforms which are beingused to research social media.Some are available for free.Others are available to a price and othersonly available to a research group or institution.Now, my advice would be that there'sa lot of different ways of actually getting involvedin doing social media research.

  • 17:02

    CARL MILLER [continued]: But everyone that does, I think needsto have an eye on the method and on the technology.Up to now, we've had both technologistson the one hand and social scientists on the other.They speak different languages.They have different concerns.They write in different journals.They have different ways of looking at the world.And that has proved to be really,really difficult and problematic when it actually

  • 17:23

    CARL MILLER [continued]: comes down to trying to glue big data technology togetherwith social science so that you can actually use big datatechnology to study society.But I think the major opportunity for studentsnow is that you can develop those hybridized skill sets.You can both develop the knowledgeof the technology, some feeling and understanding how it works,and also subject matter expertise, so

  • 17:43

    CARL MILLER [continued]: all the things which are needed to go outand actually study society.And I think in doing that will be the first kind of researchgeneration coming through which will be able to hold boththe tech and the social science in your head at the same time.And that will make you a much better social media researcherthan I ever could be.

Abstract

Carl Miller and Alex Krasodomski-Jones explain how the Centre for Analysis of Social Media is developing new tools and technology to help social scientists work with big data. As one of its first big projects, CASM examined political communication and commentary via Twitter during the 2015 U.K. elections.

Looks like you do not have access to this content.

Digital Social Research: Centre for the Analysis of Social Media (CASM)

Carl Miller and Alex Krasodomski-Jones explain how the Centre for Analysis of Social Media is developing new tools and technology to help social scientists work with big data. As one of its first big projects, CASM examined political communication and commentary via Twitter during the 2015 U.K. elections.