Skip to main content
Search form
  • 00:00

    [THEME MUSIC PLAYING]Developing New Computational Methods to Study CollectiveIntelligence

  • 00:09

    NICCOLO PESCETELLI: Yeah.I'm Niccolo Pescetelli.I am a postdoc in the Scalable Cooperationgroup at the MIT Media Lab in Boston.My background is in psychology and cognitive science.And I've always been working in decisionmaking at the interface between like one person and two people.

  • 00:32

    NICCOLO PESCETELLI [continued]: And now in my postdoc, I'm looking at more of the decisionmaking and forecasting in large groups of people,so getting more interested in this ideaof collective intelligence and wisdom of the crowds.How did you become interested in computational social science?I think I've always become more and more

  • 00:53

    NICCOLO PESCETELLI [continued]: interested in looking at a largerand larger groups of people.And I realized that the work that I was mostly interested inwas very computationally, very quantitative.And I tried to mimic it in my work.And so that requires learning programming

  • 01:14

    NICCOLO PESCETELLI [continued]: and learning statistics.And so when you start liking a research direction,you are more likely to incorporate those methodsyou like into your own work.How have you developed new research methodsfor collective intelligence?I guess, like all the literature thatexisted before on wisdom of the crowds,

  • 01:36

    NICCOLO PESCETELLI [continued]: was mainly about a simple task, like estimation task.I present to you a jar, and I askyou to estimate the number of beans that are in the jar.And those problems are usually characterized by one dimensionor one variable.

  • 01:58

    NICCOLO PESCETELLI [continued]: People tend to be independent from each other.But then when you try to scale that into the real world,for example, we are interested in forecasting political eventsand markets, then it becomes much more difficult.Because now people tend to either rely on the same news

  • 02:18

    NICCOLO PESCETELLI [continued]: sources, i.e. the Google, what we ask,so you start introducing much more of correlationsin your data, which are theoreticallybad for the prediction.And so part of this project was, how can wedecorrelate this information and like de-bias

  • 02:41

    NICCOLO PESCETELLI [continued]: our groups and our estimates?And so we started thinking of simple scenarios.Let's say, I ask you, will it rain tomorrow?And I ask four people--I can like do fancy regression lines and models,

  • 03:04

    NICCOLO PESCETELLI [continued]: but ultimately what I really want to dois looking at how many people share the same information.And we found out that we can do that with this fancy machinelearning techniques, which are called the variationalautoencoders, [variational autoencoder (VAE)] whichbasically look at your data, they try to compress it in very

  • 03:26

    NICCOLO PESCETELLI [continued]: few like variables, and then they try to reproduce the samedistribution.And if you look at those variables, the hidden units,they basically contract the informationin a very simple and representative way.And that's what we are interested in.What are sources for your data?

  • 03:49

    NICCOLO PESCETELLI [continued]: We are part of this tournament across different universitiesthat is funded by IARPA.And IARPA gives us resources, but also participants,volunteers, that they recruited.So that is our main source of data.But then on the side, we basicallyconduct small-scale experiments that are more controlled

  • 04:11

    NICCOLO PESCETELLI [continued]: and where we manipulate the variablesthat we are interested in.So it's a mixture of volunteers provided by the fundingbody and mechanical turkers or other peoplethat we recruit online.What questions did you ask your interview subjects?In terms of the forecasting problems

  • 04:33

    NICCOLO PESCETELLI [continued]: that we ask our participants, it's very varied.And that's why part of the challenge is exactly this,is trying to create a system thatis robust through the different questions that we ask.But they typically range--in terms of politics, so for example,

  • 04:53

    NICCOLO PESCETELLI [continued]: we'll ask prime minister of this countryreceive support or disruption in their governmentby the end of the year or by a certain date.In terms of economics and markets,there are some questions regarding specific stock

  • 05:17

    NICCOLO PESCETELLI [continued]: or stocks that will end up in a given rangebut, again, by a given time--in a given time window.And finally, there are some questions regarding healthand epidemics or famines, and the probabilitythat they will end up on, let's say, The New York Times

  • 05:38

    NICCOLO PESCETELLI [continued]: or other agencies.How do you manage and clean up the data you receive?In terms of data cleaning and then pre-processing,it requires a bit of work, yes.The ideal scenario would be if wehave to forecast 100 different problems,

  • 06:01

    NICCOLO PESCETELLI [continued]: you want all your participants to answerevery single question.But unfortunately, it's not exactly like this.Some people are more interested in some topics rather thanothers.So they will spend more time on those questionsrather than the questions they're not interested in.And also, some people are much more active than other users.

  • 06:25

    NICCOLO PESCETELLI [continued]: And so you have this very sparse data sets.And we typically, yes, spend few hourscreating this pipeline in which you first startnormalizing for the activity.How many predictions have this person made versus this other?

  • 06:47

    NICCOLO PESCETELLI [continued]: And you're trying to rebalance.And then you start developing methodsto fill the missing cells, some sortof recommendation system like Netflix uses.We are thinking along those linesof trying to predict what forecast this person wouldhave made based on the similarity

  • 07:07

    NICCOLO PESCETELLI [continued]: with a different user.How do you analyze your data after data collectionand cleanup?So after the pre-processing phase,what comes next is typically specifyinga model that basically you think canbe a good representation of your data and produce an outcome.

  • 07:32

    NICCOLO PESCETELLI [continued]: And the main task now is trying different modelsand see which one is more likely giventhe data that you observe.What are some of the models and software youused to analyze this data?I've seen a huge transition in the last few years

  • 07:53

    NICCOLO PESCETELLI [continued]: from Matlab, which is a proprietary software,to Python and R, which are more open source softwares.And like many people, now encouragethis new free available software.How did the analysis process for this project work?Intuitively, the process of analyzing the data,

  • 08:16

    NICCOLO PESCETELLI [continued]: it's mainly--it can be divided into phases.One is more like the exploitative phase,in which it's mainly tweaking some parameters, seeingif it fits better, constant updating of trials and errors.But then at that point, what you should dois actually fix your model, like say, OK, I

  • 08:37

    NICCOLO PESCETELLI [continued]: think this is the right one, and thenapply it to a new data set that you haven't seenand see if it's robust.Because the risk of constantly tweakingis that you will always find the model thatperfectly fits your data.How did you account for independence of opinionin this project?The independence of the judgments of people

  • 08:59

    NICCOLO PESCETELLI [continued]: is very important because sometimes people end uptrusting the same source.And so they might tell you--like me and you would have the same opinion about whether itwill rain tomorrow.And so an external observer might look at our responses

  • 09:21

    NICCOLO PESCETELLI [continued]: and say, well, maybe yeah, they're right.It will rain tomorrow.But then imagine this observer discoversthat we have read the same news and haveseen like the same weather forecasts.Now if we make a mistake, it's probablybecause the weather forecast made a mistake.

  • 09:42

    NICCOLO PESCETELLI [continued]: So what you want to do is basically notcount opinions twice.This is the gist of it.What else do you need to consider whendesigning a new research model?When you are trying to develop a new model,it typically requires a lot of meetings and talking to people,

  • 10:06

    NICCOLO PESCETELLI [continued]: and also a lot of parallel thinking, if you want.Imagine like you have this problem,but this model has never been applied to this model.So what you do is typically looking at other problem setsin different domains, from ecologyto even group dynamics in other species

  • 10:30

    NICCOLO PESCETELLI [continued]: and instead of seeing how that same problem wassolved in a different domain.And so that's why these cross talks between disciplinesis very important.What are some things you wish you had knownbefore starting this project?Definitely, a lot of more quantitative skills.So programming, statistics is very important.

  • 10:54

    NICCOLO PESCETELLI [continued]: And I know it's very scary for an undergraduateat the beginning, particularly if youare interested in social sciences and behavior.You don't see the connection at the beginning.But it's definitely worth it.And it can be very much fun.What were some unexpected challenges of this project?

  • 11:15

    NICCOLO PESCETELLI [continued]: There has been a lot of challenges, of course,in this work.Mainly I think recently what we stumbled upon was that people--it's very hard to retain users over time.We are busy with our lives.We spend-- every company, or an entertainment agency,

  • 11:38

    NICCOLO PESCETELLI [continued]: or news wants to pull our attention.And so keeping people engaged with our platformhas been very challenging.How did you maintain participant engagement during the project?It's been very hard.It seems like poking them with an email every now

  • 11:58

    NICCOLO PESCETELLI [continued]: and then is not enough.What we are thinking of doing is actually engage themso that you create almost a social network.You want people to solve the task together,like being engaged with your problem.What advice would you give peopleinterested in using crowdsourcingsoftware in research?

  • 12:19

    NICCOLO PESCETELLI [continued]: My advice for people that want to startusing, for example tools like MTurk, isgets your hands dirty from the beginning.Start tinkering around, and if you make a few mistakes,it's not a problem.Actually, you will learn a lot from it.And the second advice is probablypair up with somebody who has done it already

  • 12:40

    NICCOLO PESCETELLI [continued]: or maybe has the same interests.What are some innovations in computational social sciencethat you are excited to see.I think there is one innovation thatis going to be more and more--I've seen in computational social sciences, whichis this application of machine learningto behavior to human responses.

  • 13:02

    NICCOLO PESCETELLI [continued]: Machine learning has been mainly used in image recognitionand other data sets.But more and more, companies are collecting data on humans,on behavior.And that's going to be super challenging, but also veryexciting.[THEME MUSIC PLAYING]


Niccolo Pescetelli, PhD, postdoctoral associate in the Scalable Cooperation Group at the MIT Media Lab in Boston, discusses new computational social science research methods for studying collective intelligence, including data sources and questions; data management, cleanup, analysis, models, and software; special considerations and challenges, as well as advice and future innovations.

Looks like you do not have access to this content.

Developing New Computational Methods to Study Collective Intelligence

Niccolo Pescetelli, PhD, postdoctoral associate in the Scalable Cooperation Group at the MIT Media Lab in Boston, discusses new computational social science research methods for studying collective intelligence, including data sources and questions; data management, cleanup, analysis, models, and software; special considerations and challenges, as well as advice and future innovations.

Copy and paste the following HTML into your website