• ## Summary

Search form
• 00:00

[MUSIC PLAYING][Developing a Method to Code Text Using Crowdsourcing]

• 00:10

KEN BENOIT: I'm Ken Benoit.I'm professor at the London Schoolof Economics in the Department of Methodology.[Ken Benoit][Professor, Head of Department of Methodology][London School of Economics and Political Science]My specialization is political science, text analysis,computational methods.The crowdsourcing project arose from text analysis researchthat I'd been doing for the last 15 years thatis aimed at measuring things that come from text that we

• 00:33

KEN BENOIT [continued]: couldn't observe in other methods,using other methods or other ways.In particular, measuring ideology of political parties,measuring ideology of individual legislators.We can't observe their inner states,so we observe things that are contained in the wordsthat they use.Traditional methods of analyzing that, in order to be valid,

• 00:56

KEN BENOIT [continued]: require a huge amount of qualitative human effort,usually trained expert coders whowould rate each sentence in a manifesto or a speech,make a judgment about what type of policy it was aboutand what orientation the statement concerned relativeto that policy.With crowdsourcing project, we decidedto see whether untrained, anonymous coders, which

• 01:18

KEN BENOIT [continued]: is what the crowd is, could replace the experts whoare a very few highly expert people who would knowa lot about the background.Crowd coders are people who are not highly trained,who know nothing about the background,but there are a lot more of them.And what we found was that by splittingthe coding task of a text into small pieces

• 01:42

KEN BENOIT [continued]: and distributing those across a large crowd,it was possible to obtain valid estimates of almostany quantity that was phrased in a well-formed questionfor the crowd.And those estimates could be obtained very rapidly, usuallywithin hours as opposed to within months for experts.And when reassembled in a way that use the statistical scale,

• 02:07

KEN BENOIT [continued]: produced a valid estimate of the positions thatwere contained in the text.The classical crowdsourcing platformis what's known as Mechanical Turk run out of Amazon.We started using Mechanical Turk before they imposeda restriction that only I think workers from the USand maybe India were recruited.

• 02:28

KEN BENOIT [continued]: When they changed their restrictions,we looked at another company.It's called Crowdflower.They're based in San Francisco.And they recruit people from all over the globe.They also have a system for ensuringthat the judgments conform to a type of accuracyusing the coder's performance with gold questions, where

• 02:50

KEN BENOIT [continued]: gold questions are questions that youask of the coder to which you know the answer.And we ran a variety of experiments with guidancefrom this company, and we literallysaw this company develop as we worked on this project, whichtook about four years.And because of their way of ensuring

• 03:12

KEN BENOIT [continued]: the accuracy of the questions, itmade the characteristics of the crowd fairly irrelevant.When we put out tests in multiple languages,we didn't specify that a crowd coderneeded to understand Greek or Norwegian or German or Spanish.We simply put the job that went through to the platformand was one of the jobs that crowd coders

• 03:33

KEN BENOIT [continued]: could see and volunteer to do, because the title of those jobswas, say, in Greek.If you didn't speak Greek, you wouldn't evenunderstand what was being asked, and you certainlywouldn't pass the accuracy requirement.So we didn't really have to do muchsearching to find the crowd.We simply put it through the platform,and the crowd found the job.

• 03:54

KEN BENOIT [continued]: They found us.There were technical challenges.At the time, breaking up the text into small units.But the bigger technical challengewas calibrating the instructions.Crowd coders are very good at following very simple, verywell-defined instructions.Figuring out what simple and well definedis is something that takes a process of iteration.So we ran a lot of experiments until we felt that we

• 04:16

KEN BENOIT [continued]: had calibrated it properly.One of the nice properties of using crowdsourcingis that you never really tire out the crowd.You could exhaust a graduate student until they'redespondent and want to quit.You could annoy your colleagues to no endby asking them to do things that were inappropriate--inappropriate from a research standpoint, I mean.

• 04:39

KEN BENOIT [continued]: But with a crowd, you put up jobs,and you do get feedback right away.If they don't give you very good answers,they're still getting paid for those answers.You can look at a small subset.You can refine the instructions, and then youcan rerun it until you're satisfied with the product.And that's what we did.But because we were very new at it,it took us almost two years.

• 05:01

KEN BENOIT [continued]: The first thing is you can't ask the crowd to do complex tasks.The tasks have to be very simple, very elemental units.We asked people, for example, to lookat a sentence that was extracted from a documentand classify whether the sentence concerned immigrationor didn't concern immigration.

• 05:22

KEN BENOIT [continued]: So that was the first decision was a category classification.The second decision was, was thisin favor of restricting immigrationor in favor of having open borders?A lot of people would ask, why did yousquander the opportunity to ask about other domains of policy?All you asked people to do was tell you about immigration.

• 05:43

KEN BENOIT [continued]: The answer to that is it's necessary to be so focusedin order to keep the tests simple.If we wanted to find out about environmental policy,we would rerun the job and ask peopleto judge a sentence as being environmental policy or not.With the way that we did that experiment,we took every single sentence in a party's political election

• 06:06

KEN BENOIT [continued]: platform or manifesto.Sometimes these were hundreds of sentences long.It was wasteful to ask the crowd to judge every single sentenceto figure out whether it concerned immigration.We would now use filters, search term filtersor even some simple machine learning algorithm,to do a preliminary classification

• 06:26

KEN BENOIT [continued]: to save time and money.But running that job for the immigration manifestosof the entire 2010 British election, for instance, onlycost us about $300,$400.So we're not talking about a huge amount of timeor a huge amount of expense, but thatwas the worst case scenario.And one of the advantages of this, the big advantage

• 06:48

KEN BENOIT [continued]: of this approach, if you are an early career researcher,if you're a PhD student, you don'tneed to design the end all, be all questionnaire,because you only have one shot at it.You want to measure immigration policy for your research,you go after immigration policy.If later you decide, we wished we'dasked questions about environmental policy,

• 07:09

KEN BENOIT [continued]: you simply substitute the aspects of immigration policyfor environmental policy, and rerun the job.It's a demonstration of a way to measure things from text thatare in the political domain.It was published in a political science journal.We've had a lot of interest from peopleoutside political science who were interested in using

• 07:30

KEN BENOIT [continued]: the same techniques.And one of the things that we were asked by the refereesin going through the peer review processwas to demonstrate that it workedon text that were not just political speechesand political party platforms.I took the referee reports from the first round of peer review,

• 07:54

KEN BENOIT [continued]: parsed up the sentences, gave some instructionsabout trying to guess whether the sentence wasin favor of publication or against publication,and put that onto the crowdsourcing platform.And in the response to the referees,we showed them the scoring from the crowdsourcing platformthat gave estimates of their favorability

• 08:15

KEN BENOIT [continued]: towards publication that matched the editor'sjudgment of the five referee reports that we'd received.And we did that in the second round of referee reportsas well, and showed that their movement was much moretowards the positive end of the spectrum followingour first round of revisions.So that was a demonstration of the different domain,but using the actual comments from the people who'd

• 08:37

KEN BENOIT [continued]: asked for that demonstration.We used a fairly sophisticated statistical scaling methodthat had coder effects, sentence effects, positional effectsof the policy.It had numerous parameters that we put together and estimated

• 08:57

KEN BENOIT [continued]: using a Bayesian model.We thought that this would be an important and sophisticatedaspect of our contribution.It turns out that no one's really focused on that aspectto the method, and the results that wewere able to estimate for our main quantity of interestwere pretty much equivalent to a far simpler method

• 09:20

KEN BENOIT [continued]: that we started with.So we didn't find that we'd done anything wrongusing the simpler technique.We've indicated it.But the model that we used makes itpossible to estimate a variety of other effectsthat would be parts of language or partsof coder treatment effects that couldbe useful for the future, but weren't

• 09:41

KEN BENOIT [continued]: really useful for our project.I think the main answer is big dataand new technology are only big and new by today's standards.When we look back at the type of datathat we analyzed 20 years ago, it seems small and old.But at the time, it was new and big.When we look at data now, it's measured

• 10:02

KEN BENOIT [continued]: in terabytes or petabytes.The data that we used in a previous erawas measured in kilobytes, which seems trivially small now.If we can't engage with the data at the scalethat it exists now, then we're notin any way prepared for the future that's coming.We're not even prepared for the present.Data is already big, and data in many forms is new.

• 10:26

KEN BENOIT [continued]: And the devices that we wear and the devicesthat we use, the activities that we engage in onlineare generating data as a byproductof activities that previously didn't generate data.We need the tools and the methods to use this data.What are the challenges involved?I think the biggest challenge is scaling up and learning

• 10:50

KEN BENOIT [continued]: the tools and techniques that we needto engage with the computing aspects and the data managementaspects.Social scientists might know how to do programming,but the actual elements of computationare not a part of the traditional social sciencecurriculum.I think they need to be, and that'sone of the things we've been discussing here,

• 11:10

KEN BENOIT [continued]: how to get the appropriate training from the computerscience colleagues who know these thingsand study these things and teach these things.How do we insert those into a social science curriculumand in a useful and appropriate and we could say feasible way?In terms of the methods, I think that the social scientists

• 11:30

KEN BENOIT [continued]: have a lot that we can teach to the computer scientistsabout what the appropriate way is to study society, politics,and the economy in a structured and scientific fashion.[MUSIC PLAYING]

### Video Info

Publisher: SAGE Publications Ltd

Publication Year: 2019

Video Type:Video Case

### Segment Info

Segment Num.: 1

Persons Discussed:

Events Discussed:

Keywords:

## Abstract

Ken Benoit, PhD, Professor at the London School of Economics in the Department of Methodology, discusses his research on whether crowdsourced, untrained, anonymous coders could replace highly-trained experts in analyzing text. The accuracy of the approach, challenges, advantages, parameters considered, and the future of big data in analyzing text are all considered.