Skip to main content
Search form
  • 00:00


  • 00:09

    SAIPH SAVAGE: I'm Saiph Savage.I'm an assistant professor at West Virginia University.And I'm also a research collaboratorwith the National Autonomous University of Mexico, UNAM.So I currently work in West Virginia,and I also collaborate in Mexico City in Mexico.

  • 00:33

    SAIPH SAVAGE [continued]: My research currently combines two areas.On one hand, I do large-scale data analysis,where I study how do people producecollective action online.And this has led me to study, for instance, howdo political trolls organize and mobilizepeople for their efforts.How do they persuade people to suddenly startto participate in online harassments online?

  • 00:54

    SAIPH SAVAGE [continued]: And then I use that knowledge to buildlarge-scale collective action systemswhere we can mobilize citizens to take actionfor their communities.We're considering collective actionas a community that comes together and is working

  • 01:16

    SAIPH SAVAGE [continued]: together towards a particular goal thatmight be for the community.So we could think about it-- for instance,people are coming together maybe to build infrastructurefor the blind.So we have studied systems to mobilize citizensto build infrastructure inside buildingsto help blind people navigate better indoors.

  • 01:36

    SAIPH SAVAGE [continued]: Another type of collective actionis mobilizing citizens to collectively fight corruption.And so basically, it's getting citizens to do tasks togetherfor a particular goal.That goal can be to fight corruption, help blind people,maybe build a rural school together.

  • 02:03

    SAIPH SAVAGE [continued]: I got interested in this area of collective actionwhen I started--I've always been very passionate about whathas been going on in my country, which I'm originallyfrom Mexico.And so I started thinking a lot about, well,wouldn't it be neat if we could use technology for social good.Well, how would we start to use technology for social good?

  • 02:26

    SAIPH SAVAGE [continued]: And I started to realize, well, maybe it'shelping citizens to be able to definewhat problems do they have.And then, once citizens are able to define what problemsdo they have, helping citizens to devise a plan for overcomingthose problems.And then helping citizens to mobilize and takeaction for their communities to really create change.So I would argue that a big part of my passion

  • 02:47

    SAIPH SAVAGE [continued]: is using technology, creating systemsthat can empower citizens themselvesto create change and transform their communities into whatthey would like to see them.The way in which data is informing these systems

  • 03:08

    SAIPH SAVAGE [continued]: is, on one hand, we can think about it as, OK, youhave citizens.Maybe they have different interests.They have different expertise.A big problem that currently exists, for instance,with protests is that you are not necessarilyleveraging the interest or expertise of each individual.So I started thinking about, well,wouldn't it be really neat if we could mobilize citizens?If we could tap into each citizen's expertise

  • 03:31

    SAIPH SAVAGE [continued]: and use that for social good, and to mobilizing them to takeaction for their community, where they are reallyusing what they care about.And so how do we figure out what citizens care about?What expertise do they have?Well, you can use the data that people are producing,just need to start to figure out.Oh, well, you know, this person is an engineer.He might be able to help, maybe, build this infrastructure

  • 03:55

    SAIPH SAVAGE [continued]: for the blind that we have in this buildingthat he's passing by every single morning.And we detected, a, that he's passing by this buildingevery single morning.And maybe he has 10 to 15 minutes free in his day.We've also detected that, and we've alsodetected that he is interested in social good activities.So why don't we recruit him, and we gethim involved in this effort?

  • 04:16

    SAIPH SAVAGE [continued]: So on one hand, my research is using large-scale datato understand people's interest, to understand their expertise,and then use that knowledge to mobilize them for social good.On the other hand, I'm also very interested in using datato currently understand-- well, how arepeople currently mobilizing?And what limitations currently existin how they're mobilizing?

  • 04:36

    SAIPH SAVAGE [continued]: And can we then create systems that overcomethose current limitations?So that's why I also conduct a lot of data analysis.To understand, for instance, well,how are citizens currently using, for instance,Reddit, Facebook, to organize collective action?And what problems do they have?And where can we build better technologyto overcome those problems?

  • 05:05

    SAIPH SAVAGE [continued]: Currently my research focused on understandinghow political trolls were able to mobilize other citizensto produce collective action.And their collective action focused on, for instance,creating online harassment campaigns,creating content to support Donald Trump,creating content to attack Trump supporters.

  • 05:27

    SAIPH SAVAGE [continued]: And we started first getting interested in this researchbecause we started seeing, well, these communities arestarting to become problems for other communities.So, for instance, Reddit actuallyhad to change its ranking algorithmbecause a community-- these political trolls-- wereable to manipulate the algorithm and alwaysget their content at the top of Reddit.

  • 05:49

    SAIPH SAVAGE [continued]: And so they were able to really pursue startto persuade a lot of people to start to support them.And the thing was, Reddit didn't know how to interact with them.What was the best way to intervene?And I think that right now there'sa whole notion about, well, what is the best way to interactwith these political trolls that they might be sharing,

  • 06:12

    SAIPH SAVAGE [continued]: for instance, content that could be fake, that couldbe hurting other people.And it's not necessarily clear what are the best waysto interact with them.Or maybe even to understand them.What is their point of view?Where are they coming from?And so, given this, we really wantedto first understand a little bit moreabout, well, what are the traits of the people that

  • 06:34

    SAIPH SAVAGE [continued]: become the most active in these communities?What type of content are they sharing?And then, what is actually helping this communityto mobilize other citizens?And so, what we first did was that we started reading moreabout this large-scale political trollcommunity that exists on Reddit, which is called the Donald.

  • 06:55

    SAIPH SAVAGE [continued]: So on Reddit, you have a bunch of what are called sub-Redditwhere people get together around a particular topic.In this case, they created a sub-Redditcalled the Donald, in which people were getting togetherto talk about Donald Trump's campaign.But this community quickly becamea political troll community that started organizing efforts.For instance, to harass Amy Schumer, who is a comedian.

  • 07:18

    SAIPH SAVAGE [continued]: And since she opposes Donald Trump,they were organizing campaigns against her.They were also organizing campaigns against Netflix showsthat they thought were opposing Donald Trump.And they were also creating large-scale campaignson Twitter and Facebook, the great meme war.They were really organizing that to persuade peopleto vote for Donald Trump.And I think they were actually very effective.

  • 07:39

    SAIPH SAVAGE [continued]: And so it was important for us to understand, well, howare these people organizing?How are they mobilizing people to get involved?And so, what we did was that we collected a large-scale data,over 16 billion posts and comments from Reddit.And on one hand, we started studying,

  • 07:60

    SAIPH SAVAGE [continued]: what were the behavioral patterns of the ones thatbecame the most active.And then we started doing data analysis around the callsto action that they started to create.So calls to action was where they told the community, hey,you know what?Let's participate in a Photoshop contestto create memes against Hillary.And so that's a call to action.

  • 08:21

    SAIPH SAVAGE [continued]: They're telling people to get involvedin creating that content.Or other times, it was, hey, you know what?Let's upvote this picture of Donald Trumpso that it goes into the first page of Reddit.And so we wanted to understand what type of cultural actionwere they making?What styles to mobilize the community did they have?

  • 08:47

    SAIPH SAVAGE [continued]: We used a BigQuery.So Google has a very neat tool in which, basically, all of--absolutely all of the Reddit content, they have put it,let's say into a type of database,that you can just give a query about,you say, OK, I want, from this sub-Reddit,I would like to collect posts from this date to this date.

  • 09:08

    SAIPH SAVAGE [continued]: And it starts to return to you absolutely all of the posts.And then from all of the posts, you have, as well,all the comments.And so from there, that was how wewere able to collect all of the data.So there we were using Google's API for that data collection.And so Google's API, Google's BigQuery APIwas very nice for that data collection.

  • 09:30

    SAIPH SAVAGE [continued]: You do have to pay, however.And we paid around $1,000.Well, we paid around $1,000 for that data collection.There are other open tools.However, a lot of the times, theydo require much more coding effort to collect the data.

  • 09:55

    SAIPH SAVAGE [continued]: The biggest thing with BigQuery isthat I don't think it's very easy, sometimes,to calculate even the costs.So, for instance, you could be running a script,and suddenly not realize that the script-- soyou can create a script to say, Iwant to collect data from this date to this date.

  • 10:16

    SAIPH SAVAGE [continued]: Currently, how it's set up, it doesn'ttell you, OK, if you wanted to collect datafrom this date to this day, it's goingto cost you, let's say, $2,000.So if you're a new student, and you don't realize that,you could very easily make a mistakewhere you try to collect data from this date to this date,and it could end up being like $10,000 that youwere collecting.

  • 10:36

    SAIPH SAVAGE [continued]: So, for instance, Twitter, on the other hand,has a much friendlier interface, where it tells you--where you say, OK, I want to collect from this dateto this date, and it tells you, it'sgoing to cost you this amount.And you don't get access to anythingunless after you have paid.With Google, it's different actually,in which they start to give you data without telling you

  • 10:59

    SAIPH SAVAGE [continued]: a little bit about the cost.And so, if you're a new--if you suddenly start to be new in this,I think it's pretty easy, sometimes,to suddenly get a large--get maybe very high costs without realizing it.Also we had an issue where one of our computershad been hacked.And so suddenly, we were getting these really high costs

  • 11:21

    SAIPH SAVAGE [continued]: with BigQuery.And it seemed like there was somebody externally thatwas using our accounts and running millionsof transactions into BigQuery.And so, that can also be--that's also something to be careful about.I would argue that, if you're a new student,

  • 11:42

    SAIPH SAVAGE [continued]: it can help you to first look at the tools that are free.And then maybe start slowly to venture into these other spaceswhere you pay for the queries.But it definitely becomes much fasterto do all the data collection using Google's BigQuery.

  • 12:06

    SAIPH SAVAGE [continued]: How we sort through the data is a--well, BigQuery basically gives you, for instance, a post.So you have, let's say, the textual data.When it was created.Who created that post.So for our research in particular,one of the most important things was to start to understand,what do the calls to action that political trolls makelook like.

  • 12:27

    SAIPH SAVAGE [continued]: So the first thing was, OK, we needto identify what are the calls to action.So a call to action is when somebodysays, when somebody is telling the community to take action,do something.So one action can be, for instance,make a call to the senator to tell him to oppose Hillary.Another call to action can be let's down vote

  • 12:48

    SAIPH SAVAGE [continued]: this video of Amy Schumer.So those are calls to action, wherethey're telling the community to take a particular action.So we wanted to detect action.So what we did was, first, we identifiedall of the posts that had action verbs in them.Then once we had them labeled with action verbs,we then used crowd sourcing techniques.

  • 13:11

    SAIPH SAVAGE [continued]: We had people read through each of the postsand label whether or not it was actually a call to action.Because it could be, for instance, that somebodyis using an action verb, but they're notmaking a call to action.So we used crowd sourcing where we'reusing crowd workers, people online, thatread through the content and just label,yes, this is a call to action.No, this is not a call to action.

  • 13:33

    SAIPH SAVAGE [continued]: And then we checked out how two people, how two independentcoders, two independent people thatwere categorizing the data--how much agreement they had.Once we had good agreement, that was our label dataset of calls to action.Then once we had the calls to action,the next thing that we did was that westarted using a topic model--

  • 13:55

    SAIPH SAVAGE [continued]: sorry, clustering techniques-- to beable to basically identify different types of callsto action.So for instance, the clustering, what it helps you to dois to be able to identify the different styles of callsto action.So for the features of the clusters--so clustering techniques, what they dois that they go through the text,and they identify what is unique about this call

  • 14:19

    SAIPH SAVAGE [continued]: to action versus this other call to action.The features that we looked at was amount of public figuresthat there were mentioned.The text length.The number of slang that they were using.The number of swear words.And then, based on that, our clustering techniquesstarted to identify different groups of different clusters.

  • 14:40

    SAIPH SAVAGE [continued]: Each cluster represents a style a wayfor calling people to action.Overall-- and so there, we use mean shift algorithm.Which basically allows you to search separate your data.And what's neat about mean shift,is that you don't have to state the number of clustersthat you want.It's going to discovery automatically from the data

  • 15:03

    SAIPH SAVAGE [continued]: and overall we discovered three main clusters.One cluster-- we're calling it the political trollcalls to action.So that one is using a lot of swear words.And it's using a lot of troll slang.So these political trolls had slang.For instance, they were calling each other, pippa, centipedes.

  • 15:25

    SAIPH SAVAGE [continued]: And then their calls to action were a lot relatedto participating in harassment campaigns.So it was like, hey, let's participatein a meme war against Hillary to show how ridiculous she is.And so that was one type of call to action.Another call to action that we discoveredwas a very direct call to action.So they would just say, hey, let's up vote

  • 15:45

    SAIPH SAVAGE [continued]: this picture of Donald Trump.He deserves to be on the top.And so it was just very direct, telling peopleto do a particular action.It didn't necessarily have to be related to harassment.Nothing.It was just a very simple straightforward.And then the last one that we discoveredwas one that we're calling historian style.This style was very interesting.Because what people were doing was that they would first

  • 16:07

    SAIPH SAVAGE [continued]: explain to you the whole ecosystem about whatwas going on.And then based on that whole ecosystem,they would tell you why--they would ask you to participate in an action.So, for instance, there was a conspiracy theoryabout a guy that was allegedly murderedby the Democratic Party because he

  • 16:28

    SAIPH SAVAGE [continued]: had helped in leaking some information about HillaryClinton.Who was Seth Rich.First, they explained the whole ecosystemabout who was this guy, what happened to him.And then after that, they would tell people, OK, weneed-- guys, we need to post this picture of Seth Richto show support.And so first they explained everything that was going on,

  • 16:50

    SAIPH SAVAGE [continued]: so that people would understand what they were supporting.And then they would ask people to take action.What we actually found was that, from these three styles,the one that had most participation fromthe community, like constant-- a lot of up votes and comments--was the historian style.This is contextualizing to peoplewhy they should participate in something.

  • 17:12

    SAIPH SAVAGE [continued]: And so this was very interesting to see.Because friends of these political trolls--they weren't even necessarily mobilized the most when anotherperson was using, like these--a lot of slang and a lot of swear words.Which is what you would have expected trollsto maybe be moved by.They were more moved by explaining to people.

  • 17:34

    SAIPH SAVAGE [continued]: OK, this is what is going on.Guys, we need to take action.And so I think that was interesting to understand.Because it means that maybe these communities--maybe if you contextualize things for them,you could potentially also mobilizethem to take action for good.Or maybe to see things the other way.

  • 18:02

    SAIPH SAVAGE [continued]: For this particular project, some of the main challengesthat we had was actually really understanding this communitywith an open mind.So, I mean, given for instance that I'm from Mexico,I could feel potentially, let's say, evenharassed by these communities.I mean, considering that Donald Trump called Mexicans rapists.

  • 18:24

    SAIPH SAVAGE [continued]: So I think it was really understanding this researchfrom an open mind and being able to have an open mindto start to talk with the political trolls thatwere participating.And start to really understand their point of view.And this really changed, for instance,how we conducted the research.

  • 18:45

    SAIPH SAVAGE [continued]: And because we started to really see that,it was actually pretty interestinghow they were mobilizing others to get involved.And so, I think that that was oneof the most important things, like to reallyhave an open mind about them.I think also that--there was the media, as well, hasbeen portraying people that are supporting

  • 19:06

    SAIPH SAVAGE [continued]: Donald Trump within a certain light.And it was also really--we did so many interviews that my perception of them changed.And I suddenly started seeing someof their viewpoints, in which a lot of themare suffering actually harassment at work.For instance, a lot of them can'tsay that they're supporting Donald Trump because they

  • 19:29

    SAIPH SAVAGE [continued]: suffer discrimination at work.And so I think that was very eye opening where you suddenlysee that these individuals, as well, are experiencingtheir own types of harassment.And it's really understanding their point of view.Another thing was also that, for instance, we

  • 19:52

    SAIPH SAVAGE [continued]: discovered that there were a lot of data scientistsparticipating in the movement.And so it was interesting, as well,to really change our conception about whowas behind the movement.There were a lot of tech folks.And it made sense, as well, that there were a lot of tech folks.Because they had been actually very effective, for instance,in being able to create digital strategies for how

  • 20:14

    SAIPH SAVAGE [continued]: they wanted to mobilize people.And just realizing that, that maybe, for instance,I even had preconceptions about who they were.It changed.It changed a lot about how I saw things.And I guess it was also interesting evento like understand even more about--

  • 20:38

    SAIPH SAVAGE [continued]: Now I feel I understand much more,for instance, when pro-Trump supporters saycertain comments.Or Trump says certain comments.Now I can contextualize it, and I understandwhere it's coming from.And I also understand now a lot more the phrasesthat they have.And so this really changed a lot my own perception

  • 21:01

    SAIPH SAVAGE [continued]: of the environment.And it helped me a lot to be much more understandingand really think about that it is possible to collaborateand do, also, really interesting things with these other people.The way in which we mixed our qualitative analysis, which

  • 21:24

    SAIPH SAVAGE [continued]: was, for instance, interviews with our large-scale dataanalysis, was that our qualitative interviews reallyhelped us, a, first to start to identify someof the main points in which we might want to do some dataexplorations.So within the interviews, we startedrealizing about this importance that itwas for this political troll community

  • 21:45

    SAIPH SAVAGE [continued]: to explain their point of view to other people.And we suddenly realized-- it clickedthat it was super important for them, because theyhad a lot of opposition from, for instance, the media.Even Reddit.So, for instance, Reddit CEO, at one pointcame into the community.And he started editing the commentsthat people were doing.And so we realized, like, OK, these guys, since they have--

  • 22:05

    SAIPH SAVAGE [continued]: through the interviews, really gotto understand that they had a lot of opposition.And so for them to be able to mobilize people,it was super important to clearly explainwhat was going on.And so we quickly realized that understanding the mobilizationwas a very important research direction to go into.

  • 22:25

    SAIPH SAVAGE [continued]: And so the interviews helped us on one handto really understand a good research directionto go with that data set.On the other hand, the interviewsalso helped us to better understand some of the patternsthat we were finding.So for instance, we found that the most activewere using bots.

  • 22:47

    SAIPH SAVAGE [continued]: And the interesting thing was that they were playing--these bots where gamification bots.So they were playing games with the bots.So the interviews helped us to understand, well,why exactly is it so entertaining to becommunicating with these bots?And so, the bots are basically-- they see themas a way in which they are also enforcing their community.

  • 23:09

    SAIPH SAVAGE [continued]: And so they're creating an identity for themselvesthrough the bots.Basically the bots are-- they become a gamewhere you get points, for instance,for using certain slang terms.You get points for, like, promotingcertain points of views.The interviews helped us to understand the patternsthat we were identifying.Also, for instance, we found instances

  • 23:30

    SAIPH SAVAGE [continued]: where, let's say, we had an eventin which we had a bunch of comments and up votes.And we didn't know what that event was.The interviews again helped us to understand, oh,this was when--For instance, actually that was how we discovered about--this was when Reddit CEO came in,and he started editing the comments.

  • 23:51

    SAIPH SAVAGE [continued]: And so the community had this town hallwhere they were discussing what to do about it.And so the interviews helped us to understandwhat was going on, what was the perspectiveof the community with those events.And also further understand, well,why were people getting involved.

  • 24:12

    SAIPH SAVAGE [continued]: The way that we got people involved in our studywas a snowballing effect.And so I actually had a friend, a close friendwho was part of the community.And he connected us with much more people of the community.And so that was how we started gettingheavily involved with the whole community

  • 24:35

    SAIPH SAVAGE [continued]: and just understanding much more their perspectives.Actually, also another thing that I thought aboutis, like I even--there were times when I did not alsounderstand, for instance, the memes that they were sharing.And the interviews also really helpedus to understand, OK, what is behind the memes?What is this game that they're playing?

  • 24:55

    SAIPH SAVAGE [continued]: And so the interviews were very helpful just an understandingin detail the community.But yeah, and I would argue that, Ithink it helped to get the interviews,to come to them with an open mind.So I really had to step aside from the stereotypesthat I had about who were behind the community,

  • 25:18

    SAIPH SAVAGE [continued]: to start to understand, OK, you know what?You have a lot of data scientiststhat are participating.You also have-- so for instance, thereare actually a lot of Hispanics that are participatingin that community.And so that was very interesting to see that it was actuallya very diverse--it was actually a very diverse group.

  • 25:41

    SAIPH SAVAGE [continued]: I think that for students who are wantingto get into research, one thing to consideris that you might have a lot of background informationthat you can use for your research.And you should take advantage of them.So for instance, in this case, I hadan intuition about these political trolls et cetera.And then getting these interviews,

  • 26:02

    SAIPH SAVAGE [continued]: like getting my friends involved and sothat they could explain things to me, et cetera, reallyhelped me.So what I would argue is that, for instance,since maybe you're young, and you'reusing a lot of social media tools,you might have really good intuitionabout maybe a sense of online phenomenon that is going on.Take advantage of that.Because a lot of the times, for instance, for external people,

  • 26:23

    SAIPH SAVAGE [continued]: you have an advantage over them, because youunderstand that niche.And then you can use data analytics toolsto really uncover what is going on.So I would argue that a good thing to think aboutis, think about an online phenomenonthat maybe you might know in particular about.So maybe it's, for instance, you know how models are building

  • 26:46

    SAIPH SAVAGE [continued]: their profile on Instagram.For instance, I personally don't know how that works.But if you know how that works, thatcould be a good research opportunity for you to explore,so that the world can better understand that.And then you can also think about whatis missing, for instance, with current Instagramtools for models.And there you have the opportunityto now create new systems that can cover the needs of people.

  • 27:10

    SAIPH SAVAGE [continued]: So I think it's a lot--it helps a lot to really think about, OK, from the worldthat I know, where am I finding holes?Maybe I'm finding holes, for instance, in how modelsare sharing information on Instagrambecause they're not profiting because x, y, and z.You right there have identified a hole in the system.

  • 27:33

    SAIPH SAVAGE [continued]: This means that you have the opportunityto now create a new system that covers those needs.So you can take advantage of that.And there's research to do, for instance, in uncoveringexactly where is that problem.So you can do data analysis on that.But then you can also think about creating systems.So another part of my research focuses a loton using the data analysis to build new systems.

  • 27:54

    SAIPH SAVAGE [continued]: And there's opportunity as well there.


Assistant Professor, Saiph Savage, PhD, shares insights into her research on large-scale data analysis and collective action, specifically how political trolls are able to mobilize members of their internet communities. Discussed are tools and programs used, benefits and challenges encountered, how data was collected and analyzed, surprising findings, and advice for performing similar research.

Looks like you do not have access to this content.

Researching Collective Action & Political Trolls on Twitter using Clustering Techniques

Assistant Professor, Saiph Savage, PhD, shares insights into her research on large-scale data analysis and collective action, specifically how political trolls are able to mobilize members of their internet communities. Discussed are tools and programs used, benefits and challenges encountered, how data was collected and analyzed, surprising findings, and advice for performing similar research.

Copy and paste the following HTML into your website