[MUSIC PLAYING][Sensing Human Behavior Using Online Data]
SUZY MOAT: So my name's Suzy Moat.I'm an Associate Professor of Behavioral Scienceat Warwick Business School.And I'm also a fellow of the Alan TuringInstitute, the UK'S national institute for data science.Now, at Warwick Business School, Ihave the huge honor of directing the data science lab
SUZY MOAT [continued]: with my colleague Tobias Preis.And in our lab, we're particularlyinterested in data from the internet, so dataon what people are looking for on Google, data on photospeople upload to websites such as Flickr or Instagram,
SUZY MOAT [continued]: data on games people play and the tracesthat this leaves behind.What we want to know is whether wecan use these volumes of data to better measurehuman behavior in the real world that was previously tooexpensive, too time consuming, or simply
SUZY MOAT [continued]: impossible to capture.So let me start with a example from rightback at the beginning of our research programusing data from Google.So something really fascinating about data from Googleis its global breadth.Never before have we had an opportunity
SUZY MOAT [continued]: to measure what information people are interested in allaround the world.But there can be some challenges in tryingto compare what people are lookingfor in different countries because peopletend to search for things in different languages.Now, luckily, one day my collaborators, Tobias Preis,
SUZY MOAT [continued]: Jean Stanley, Stephen Bishop and I,we had a moment of inspiration.And we realized there is one thing whichis fairly universal between languages.And that's the year in Arabic numerals, so 2018, 2019, 2017,
SUZY MOAT [continued]: for example.So back when we first carried out this study,we took data on what people in 45 countriesall around the world were looking for on Google in 2010.In each of these 45 countries, we
SUZY MOAT [continued]: knew there were at least 5 million internet users.And for each of these countries, wemeasured how often they looked for the next year, 2011,and how often they looked for the previous year, 2009.
SUZY MOAT [continued]: So on this map, countries that are colored in bluewere looking more for the next year, 2011,whereas countries that are colored in redwere looking more for the previous year, 2009.Now, if you look at that map, you
SUZY MOAT [continued]: might start to recognize a pattern.So we can pick out some countries in blue,such as Germany or Switzerland, wherewe know that on a global basis their citizens are relativelyeconomically well-off, whereas we can pick out
SUZY MOAT [continued]: some other countries, colored in red,such as India, where we know, again on a global basisgenerally, their citizens are not so economically well-off.So to investigate this relationship more closely,we created something we called the Future Orientation Index.
SUZY MOAT [continued]: So for each country, we divided the numberof searches we saw for the next year, 2011,by the number of searches we saw for the previous year, 2009.And we compared this Future Orientation Indexwith per capita GDP.
SUZY MOAT [continued]: And what you find is that indeed internet users in countrieswith a higher per capita GDP do tendto search for more information about the future.So why might this be?Now, a strong interpretation might
SUZY MOAT [continued]: be that focusing on the future somehow ledto economic well-being.Or perhaps, being in a better economic stategives you more cognitive resourcesto consider things that are happening in the futurerather than now or the past.
SUZY MOAT [continued]: An alternative explanation, however,is that what we've picked up on hereis the increasing tendency for internet users in countrieswith a higher per capita GDP to rely on informationfrom the internet to help them makedecisions about what they're going to do in the future.
SUZY MOAT [continued]: And this is an idea that we've explored in other worksthat I won't talk about today, where we've investigatedwhether traces of what people are looking for on Googlemight help us anticipate what they'regoing to do in the future.We're leaving prediction to the side for a minute.
SUZY MOAT [continued]: Let's take a step back from measuring thingsat global scale and consider quantitiesthat have been traditionally rather challenging to measureat a more local scale.And one of these is the size of a crowd.Now, I think most of us probably wouldn't
SUZY MOAT [continued]: struggle too much to identify the picturewith more people in it.But extreme, well-known cases aside,working out the size of a crowd has traditionallybeen a source of much difficulty.So I'm sure you've all experienced this.
SUZY MOAT [continued]: You know, there's been a protest,and the people who've organized the protesthave gone to the newspapers and said,oh, there were loads of people at the protest.And then shortly afterwards, some authoritieshave come along and they said, no.There was hardly anybody at that protest.And part of the reason we see this discrepancyis this real difficulty in working out
SUZY MOAT [continued]: what that number actually was.So in some approaches people havetried to take aerial photographs and counteither the people or the shadows in those photographs.There are some computer-vision approachesthat try to look for, you know, the top part of a body.But if you can't really make out that much
SUZY MOAT [continued]: about the top parts of people's bodies,you experience some difficulties with this approach.And in one famous case, the Million Man March,the researchers who were charged with working outwhether there really were a million people therewent down to a basement and drew a 1 meter
SUZY MOAT [continued]: by 1 meter square on the floor.And the leader of the group triedto see how many of his PhD studentshe could fit into the square so they coulduse that as an upper limit on how many peoplewould realistically fit in a small area.So my colleague Tobias Preis and Federico Botta--
SUZY MOAT [continued]: he was a PhD student at the time,now Dr. Federico Botta I'm happy to say--were looking at problems like this.And we are wondering, well, are we not now all carryingaround the answer to this problem in our pockets?We've all got mobile phones.They're all communicating with the mobile phone network.
SUZY MOAT [continued]: And so, you know, could we not use the datathat that process generates to work out how many people areactually in a small area?So it seems logical, you know, if you'vegot more mobile phone activity, there'sprobably more people there.But to work out whether you really
SUZY MOAT [continued]: can use that data to make estimates of crowd size,you need a couple of things.So the first thing is the mobile phone data.And frustratingly, this can actuallybe quite difficult to get hold of.So this is the point where a lot of experiments would, you know,end and the researchers would give up, us included.
SUZY MOAT [continued]: But luckily, on this occasion, wehad access to two months of mobile phone datafrom the city of Milan in Italy.So this is Federico, and this is a map of mobile phone internet
SUZY MOAT [continued]: activity, so how much internet usage phoneswere demonstrating on a particular afternoonin December in 2013.We also had data on calls that people made,texts that they sent.And for the same period, we have a full record
SUZY MOAT [continued]: of all of the tweets that people had been sending in Milan.So if we've got the mobile phone data,there is still one more thing we need.And that's a place where we actually already knowhow many people are there, which is kind of frustratingbecause that was the thing we were struggling to work out
SUZY MOAT [continued]: to start with.But luckily, there are some caseswhere we do know how many people are in a confined location.From one of these is football matches.So if you want to go to a football match,you need to buy a ticket.You need to go through some turnstiles.So we have that attendance count.
SUZY MOAT [continued]: And luckily, Italians like football.So there is, indeed, in Milan a football stadiumcalled San Siro.And so we were able to look at the data thatwas just coming from San Siro around football matches.So here is an example of one football match.
SUZY MOAT [continued]: And amusingly, if you look at the mobile phone internetactivity that comes out of that, youcan actually see people pocket their phones during the twohalves of the football matches and then get it back outduring the interval.Anyway, so we have a football stadium.10 football matches occurred in this stadium.
SUZY MOAT [continued]: And the stars really aligned for us because Federicois himself Italian.So he was able to go and find all the newspapers thatwere published around this time and extractthe number of people who were reportedbeing at each of the 10 football matchesthat took place in San Siro during the two months for which
SUZY MOAT [continued]: we have the data.And if you plot the number of peopleat each of those football matches against,for example, the mobile phone internet activity,you see a really striking correspondencebetween the patterns in the number
SUZY MOAT [continued]: of people who were attending and the spikesin the mobile internet activity data.I've shown you only one for the sake of space on this slide,but as you'd see in the paper, we see exactly the same patternfor data on calls and texts and actually indeed data
SUZY MOAT [continued]: from Twitter.So what we wanted to know is, well, OK,if we only had the data at the top,would we be able to estimate the numbers at the bottom?So what we did is we built a very simple linear regressionmodel.And for each of these matches, we
SUZY MOAT [continued]: trained the model on the nine remaining matches.So we would train--we would pick up the relationshipbetween the internet activity and the attendees.And we'd see, OK, can we then usethat model to take the internet activity for the remainingmatch and work out the number of people
SUZY MOAT [continued]: who would have been there?And it turns out that the model performs pretty well.Now, this is particularly strikingif you consider that that's a model that has basically justbeen trained on nine data points.So this is exciting because it's not only protestswhere people need to know how many people are in an enclosed
SUZY MOAT [continued]: location.I'm sure you can all think of situations wherethat information is of much greater importance,such as crowd disasters, where it wouldbe good to get quick, cheap measurements of how many peoplewere in particular locations so that we could try to take
SUZY MOAT [continued]: action more quickly to reduce build upand avoid it going past a dangerous level.Now, having said that, it's clearthat we could do with testing this model on a widerrange of scenarios before putting it into use
SUZY MOAT [continued]: in life or death situations.Frustratingly, there we hit againthe challenges that exist in accessing datafrom mobile phones.But luckily, there are a number of other data sourcesthat give us information on where people are
SUZY MOAT [continued]: and where they're going.And one of those is data on photosthat people upload to photo-sharing sitessuch as Flickr.So if you consider this map, thisis a map that we created just by visualizing the locations of 32
SUZY MOAT [continued]: million photographs that were taken over the course of a yearand uploaded to the photo-sharing site Flickr.Now, my colleagues, Daniele Barchiesi,a post-doc we were working with at UCLat the time, Christian Alis, Stephen Bishop, Tobias, and I,
SUZY MOAT [continued]: we were looking at this map, and we thought,well, you know, that's interesting,because potentially this data doesn't justtell us where people are, but if they're taking photoswhen they, for example, go away on holiday,it might also be telling us how people move around the world.
SUZY MOAT [continued]: Now, we knew that the UK authoritiesinvest a fair amount of money into tryingto understand how many people come to the UKevery year from different countries.So those of you who've flown into UK airports often enough,
SUZY MOAT [continued]: you might at some point have been approached by somebodywith a clipboard.He wanted to know where did you spend the last 12 months?And this questioning, that's partof the international passenger survey.So that's a survey that the Office for National Statisticscarries out to try and understandeach year how many people come to the UK
SUZY MOAT [continued]: from a range of different countries.So my collaborators and I, we were looking at this map,and we were thinking, well, is there not a possibilitythat we could come up with those numbersbut without using the clipboards.Perhaps we could find all the people
SUZY MOAT [continued]: who took photographs in the UK.And instead of asking them, wheredid you spend the last 12 months,we could go back and look.Where did they take photographs over the past 12 months?Now, to approximate the answer to this question, wheresomebody had really been for the past 12 months,
SUZY MOAT [continued]: we have to make a pretty massive simplifying assumption.So specifically, if we see somebodytake a photograph in one country, for example, Germany,we assume that they have stayed in that countryuntil we see them take a photograph in another country,for example France.Obviously a simplification.
SUZY MOAT [continued]: But we wanted to know, even if wemake that huge simplification, can we come upwith reasonable estimates?And the answer is yes, we can.And interestingly, if you dig into the dataeven further, what you find is that while the ONS speakto 40,000 people a year, with the Flickr data,
SUZY MOAT [continued]: you get data on 15,000 people a year for free.So what's the takeaway?Are we suggesting that we abandon all these surveys,and we just use social media data instead?Absolutely not.But it does give you some idea of the potential
SUZY MOAT [continued]: for using this sort of data to generate very quick, verycheap estimates of important statistics about society.And I'm delighted to say that we have sincebeen working with the Office for National Statisticson a number of problems in this area.But as excited as we were about this result, and as
SUZY MOAT [continued]: excited as we were that somebody wanted to use it,we couldn't help but feel maybe we'veleft out the most interesting parts of this data.And that's the photograph itself.So this is a photograph of the Lake District.Now, I grew up near the Lake District.
SUZY MOAT [continued]: And I used to love spending time there because it's beautiful.And being in this beautiful locationwould make me feel better.Now, up in the top right-hand corner there,that's Chanuki Seresinhe.Chanuki is a doctoral researcher in our data science lab.
SUZY MOAT [continued]: But before Chanuki came to work with us,she had a really interesting careerdoing a number of things, includingrunning her own design studio.So Chanuki has a real sense of the wayin which the appearance of places and thingscan impact our everyday well-being.
SUZY MOAT [continued]: So Chanuki, Tobias, and I we're lookingat photographs like this.And we were looking at all of this photographic data that'snow available online.And we were wondering, is there not some waywe can use that data to get some quantitative insightinto the relationship between the way a place looks
SUZY MOAT [continued]: and how healthy we feel?Now, as much as I love the Lake District, like many people,I tended to go there on holiday.And so arguably, the most important angleto this question relates to where people live.And thanks to the Office for National Statistics,we have some excellent data on how healthy people consider
SUZY MOAT [continued]: themselves to be, and in different places in which theylive across the country.So here for England, I've plotted some data,which comes from a question in the census, whichwe will answer every 10 years.And that question asks people to report on how healthy they
SUZY MOAT [continued]: consider themselves to be.So you can choose from everythingfrom very poor to very good.Now, we need to normalize that data because,as you'll realize, unfortunately older peoplewill tend to report themselves to be less healthy.And there is actually a differencebetween men and women as well.
SUZY MOAT [continued]: Men tend to report themselves to be less healthy than women.And so we normalize for age and for gender.And then we've plotted this map.So lighter areas are areas where people reported their healthto be worse.And you can see that some of the key lighter areas
SUZY MOAT [continued]: are cities such as London or Liverpool.So that's data on health.But how on earth do we quantify the aestheticsof the environment at that sort of national scale?So we were delighted that the answer to this question
SUZY MOAT [continued]: came to us in the shape of a game called Scenic or Not.So Scenic or Not is a game that wascreated by an organization called mySociety back in 2009.I'm happy to say it now lives in our lab.And Scenic or Not shows people photographs like this one
SUZY MOAT [continued]: and asks them to rate them between 1 and 10.So if you think it's very scenic, you can give it 10.If you think it's not very scenic, you can give it 1.Or you can choose something in between.And in turn, these photographs come from another gamecalled Geograph.So Geograph is a game where people get points
SUZY MOAT [continued]: for uploading photographs in as many kilometers squares aspossible across Great Britain.So on the back of Scenic or Not and Geograph,we have over 1.5 million ratings for over 200,000 locations
SUZY MOAT [continued]: across Great Britain.So we can plot that, again for England here.So darker areas are areas that peoplehad rated as more scenic.So up there in the North, that's the Lake District,so everybody agreed with me.And Cornwall also does quite well.
SUZY MOAT [continued]: And London, you can see, is rather lighter.So it's not scored to the same sort of heady scoresas we saw up in the Lake District.So what we wondered was, what willhappen if we put this data together with the health datafrom the census?
SUZY MOAT [continued]: And the answer is that we find that people who live in areasrated as more scenic report their health to be better.Now, if you've just been looking at those two maps,you may well be thinking, well, are there not some reallymassive confines here?
SUZY MOAT [continued]: You know, you've just pointed out that people in citiestend to report themselves to be less healthy.And stunningly enough, cities are notscoring as well on this metric.So maybe it's just about whether people livein the country or the city.Now, we thought this, too.And so we split England into urban, suburban,
SUZY MOAT [continued]: and rural areas.And we looked at whether the relationshipheld in each of those types of areas on their own.And the answer is it does.So even if you look, for example, just at urban areas,where the scores are not as high overall,but where you do still see differences
SUZY MOAT [continued]: in how attractive people find different places,even if you just look at urban areas,it is still the case that people who live in areasrated as more scenic report their health to be better.Now, even with that out of the way,you might think, well, OK, but there's stillsome other obvious things that wouldexplain this, not least money.
SUZY MOAT [continued]: So, if people have more money, then theycould use that money to look after their health.And they could also draw on that moneyto buy a place in a more attractive location.So then, again, you'd expect to see this relationship.So we wondered that, too.
SUZY MOAT [continued]: And luckily, there's an awful lot of datathat we can get on socioeconomic indicators such as income.And so we put measures of various socioeconomicdimensions, including income deprivation, into this model.And while you do find, unfortunately,
SUZY MOAT [continued]: that people with less income do tend to report their healthto be worse, it's still not enough to explainthis relationship.Now, a final thing you might wonder,looking at this photograph of the Lake District,is whether we haven't just found an incredibly complicated way
SUZY MOAT [continued]: of measuring whether a place is very green or not.So the Lake District in my opinion, very attractive.It's fairly indisputably incredibly green.Now, the good news is that we havebeen able to measure how green different places are
SUZY MOAT [continued]: for a very long time using aerial photography.So we can take that data from aerial photography,and we can map that, too.So again, darker green areas are areas with more green space.And what you find is that while, yes, of course,the Lake District is indeed very green.
SUZY MOAT [continued]: This measure is not the same as the measure of aesthetics.So for example, we can start to identify some areastowards the east of England that score highlyin terms of green space but don't scoreso well in terms of aesthetics.
SUZY MOAT [continued]: So what we wondered was, OK, well, which of these data sets,when combined with all of our controls,would help us best explain differencesin how healthy people report themselves to be?So we built three models.And in one model, we just use data on how scenic places were.
SUZY MOAT [continued]: In another model, we just used data on how green places were.And in a final model, we put both of these datasets in together.And what we find is that the statistics show usthat if we want to best explain differencesin how healthy people report themselves to be,
SUZY MOAT [continued]: we can't afford to ignore this data on aesthetics.So in this diagram, we've got a measureof the probability of these different models giventhe data.So areas colored in purple show us the weight of evidencefor the model that just uses the data on aesthetics.
SUZY MOAT [continued]: Areas colored in green, the weight of evidencefor the model that just uses green space.And you can see there's hardly any areas thatare just colored green.Stripey areas are areas which showthe evidence for the model that uses both data sets.Notably, by this analysis, if you look in urban areas,then you see that the weight of the evidence
SUZY MOAT [continued]: actually speaks for just looking at the dataon environmental aesthetics alone.Now, I should be clear.What we've got here at the moment is a correlation.We'd be going too far at this stageto say it is definitely the case that livingin a more scenic area makes you healthier.
SUZY MOAT [continued]: So one obvious thing that we can't yetrule out, although we're working on it,is whether healthier people decideto move to more attractive areas.But if we do assume for a moment that perhaps it'sthe case that living in a more scenic arealeads people to be healthier, we'd
SUZY MOAT [continued]: still want to ask, but why?Is it perhaps that people do more exercisein more attractive areas?Or is it maybe just that being in a more attractive arealeads people to feel happier?And when they feel happier, they also feel healthier?Now, this is an important and interesting topic, happiness.
SUZY MOAT [continued]: Because much like aesthetics, traditionally it'sbeen very challenging to measure.You've either had experimental approaches,where you'd have a low number of people, or survey approaches,where you wouldn't be able to get the frequency of measuresyou might like to understand daily impact on happiness.
SUZY MOAT [continued]: And so for this reason, we were delighted to startworking with our collaborator, George McCarronfrom the University of Sussex.Now, George created an app called Mappiness.And so Mappiness pings you a couple of times a dayand asks you how happy are you feeling.
SUZY MOAT [continued]: It also asks you some other questions.For example, who are you with?And also, what are you doing?So on the back of Mappiness, we haveover three years of data on the changesin happiness for over 15,000 people.Now, if you've got that volume of data,
SUZY MOAT [continued]: you start to see some interesting patterns.So I'm happy to reveal that, as of tomorrow,life starts getting a lot better.So unfortunately, I didn't pick the best monthto give this talk.But thankfully, I wasn't invited to give itin the past two months.So, that could have looked worse.
SUZY MOAT [continued]: But these patterns aside, data from Mappiness combinedwith data from Scenic or Not allows us to ask questionsthat we would have really struggled to getsensible answers to before.So specifically, are people really happier when they'rein a more attractive location?
SUZY MOAT [continued]: And the answer is, yes, they are,even if you take into account whether it's a naturalor built-up area, whether it's a green area,whether it's an urban, rural, or suburban area,as well as a very long list of measurements of who people arewith, the activities that they're doing, the weather,
SUZY MOAT [continued]: many things that--many more things than I can fit on this slide.And having this sort of numeric measurementof happiness and of aesthetics startsto let us ask very specific questions as well.So, you know, how big is the effectthat you see when people move from the least scenic area
SUZY MOAT [continued]: to the most scenic area?And you see that the change, the relative difference,in happiness is about the same size as the change you wouldsee if people were listening to musicor were talking, chatting, or socializing.It's approximately the inverse of whatyou'd see if people were traveling or commuting.
SUZY MOAT [continued]: So I'm hoping nobody had to come a long way to get to this talktoday, as that might not have put me in a best position.So we've got these results on happiness and health.And we showed them to planners.And we were delighted that they were interested in it.But the first question that they would always ask us
SUZY MOAT [continued]: was, well, OK, what actually makes a place more attractive?Given that we've been trying to do this very consistent work,we're in an incredibly frustrating situation.Because we had these 200,000 photographsthat had all being rated.But in answering that question, we
SUZY MOAT [continued]: were reduced to finding the photographs thathad been given the highest ratings,looking at a few of them, and then saying, well,I reckon people like bridges--[LAUGHTER]--which, you know, was not really whatwe were trying to go for here.Now, the good news is that as some of youmay be aware, over the past few years,
SUZY MOAT [continued]: we've seen huge advances in the capabilities of computersto work out the content of an image.And computers are an awful lot betterat looking at lots of data in this way than humans are.So instead of us trying to look at all of these photographs,
SUZY MOAT [continued]: we started feeding them in to a convolutional neural network--so this is an approach for deep learning--and specifically, a convolutional neural networkcalled Places CNN.So if you put a photograph like thisthat was rated 10, into Places CNN,
SUZY MOAT [continued]: you can get out some information about what's in the photograph.So this photograph gets high ratings for valleys.It works out there's a lake in there.Mountain gets a high rating, too.And you also get some measurementsof the general attributes of the scene.So it realizes, OK, there's natural light.
SUZY MOAT [continued]: It's an open area.And we've got in total over 300 featuresthat get values like this.And so with Places CNN, we were able to put all 200,000photographs in and get these measurements of over 300features coming back out again.
SUZY MOAT [continued]: And so then using a regression approach thatcan cope with the fact that many of these variableswill be correlated, so specifically an elastic net,we can start to actually ask, OK, what things in a photographwere making people rate it more highly?Now, some of the answers might not surprise you.
SUZY MOAT [continued]: So it turns out everybody likes lakes, not just me.Turns out valleys tend to improvethe score of a photograph.In contrast, industrial areas do notdo a photo's score any good.Nor do buildings that the neural net thoughtlooked like hospitals.
SUZY MOAT [continued]: So you look at these results, and it looks quite trivial.It looks like, OK, natural areas are scoring well.Built-up areas are not scoring so well.But if you dig into the data a little further,in particular around built-up areas, where many of us live,you start to see that the story is a little more nuanced.
SUZY MOAT [continued]: So in particular, we see that buildings with character,like cottages or castles, boost the score of a scene,even though they're built elements.Similarly, turns out after all, people do like bridges.So viaducts, aqueducts, they also boost a score of a scene.
SUZY MOAT [continued]: But interestingly, we find out that it's not the casethat all green space is equal.So while trees lead to higher scores for a photograph,large areas of grass, like athletic fields,actually pull the score down.So this helps us understand why that measure of a green space
SUZY MOAT [continued]: was not the same as the measure of aestheticsthat we saw before.So it was great that this neural network could tell uswhat was in the photographs.But we were aware from Scenic or Not,we only have one measurement for every kilometer squared.And again, in the cities in which many of us live,
SUZY MOAT [continued]: this is not a very dense grid.There's many places in between those different pointsthat we've captured.Now, Geograph actually has a lot more photographsthan have been rated on Scenic or Not.So we thought, you know, it would be greatif the neural network could not only
SUZY MOAT [continued]: tell us what was in the photograph,but it could actually work out how to rate them for itself.So we looked at whether we could do this.And it turns out we can.If you take the neural network that came up with these labels,and you train it further with the data from Scenic or Not,
SUZY MOAT [continued]: then we get reasonable performanceon estimating the scores that people online wouldhave given to these images.So, for example, this is an excerptof an area you might recognize here in London.And all of the dots represent photographs
SUZY MOAT [continued]: that the neural net had rated for itself.So blue dots represent photographsthat got high scores.Red dots represent photographs that got lower scores.If you look at that image, there'ssome pretty obvious patterns that jump out.Looks like the neural network likes parks.
SUZY MOAT [continued]: And we've got blue dots drawn all over those.And indeed, if we ask the neural network,OK, please show us your top 5% favorite locations in London,it picks out places like Hampstead Heath.But interestingly, in that top 5%,it also finds locations like Big Ben
SUZY MOAT [continued]: and the Tower of London, so built,well-known landmarks, buildings with character,in line with our previous results.So overall, we find that beautiful places are not justgreen places.Beautiful places were also not just natural places.
SUZY MOAT [continued]: But we have some initial evidence using datafrom an online game that beautiful places might actuallyhave consequences for something as important to usall as our health and happiness.Now, as a final example, let me tell youabout an occasion where we thought the online data could
SUZY MOAT [continued]: give us good measurements of people's behavior,in particular in relation to their health.But then it turns out that the measurements were way off.And this is a story of Google Flu Trends.Now, I'm guessing everybody here,like me, has at some point had the flu.I hope you didn't have it this winter.
SUZY MOAT [continued]: It's pretty rubbish, makes you feel awful for a few days.But for most of us, as unpleasant as that experienceis, we get over it.We get back to work.We get back to what we were doing before.But there's a section of society for which flu is a big danger.
SUZY MOAT [continued]: And it can lead to them being hospitalized,or it can even lead to death.And so this is one of the reasons for which doctorsare very keen to have quick, accurate measurements of howmany people have diseases like the flu right now.Now, the problem is that normally
SUZY MOAT [continued]: what doctors know is not how many people have the flu rightnow, but how many people have the flu one or two weeks ago.And that's because of the amount of timethat it takes to collect this information.So normally what happens is somebody has got the flu.They go to the doctor, and they say, I think I've got the flu.
SUZY MOAT [continued]: The doctor tells the Central Authoritythat person has the flu.But this all takes a while.So engineers from Google spotted an opportunity to help.Because they thought, OK, well, if somebody's got the flu,they might not just go and tell the doctor.They might tell us at Google, too.They might look up flu medicines or flu symptoms.
SUZY MOAT [continued]: And as with many online data sources,those measurements from Google are availablepretty much straight away.So the engineers from Google lookedthrough all of the different thingsthat people searched for online.And they tried to find terms that people searched
SUZY MOAT [continued]: for more when more people have the flu and termsthat people search for less when fewer people have the flu.And once they'd identified a set of terms, where the searchvolume went up and down in line with changes in how many peoplehave the flu, they started to generate
SUZY MOAT [continued]: their own estimates of how many people have the flu each week.So on this chart, estimates or rather data shown in redis the official data.Data shown in black is the estimates which came through
SUZY MOAT [continued]: from Google Flu Trends.And so you can see a couple of things.Firstly, the black dot is always in front of the red dotbecause the Google Trends estimates come throughmore quickly.And secondly, the red dot tends to follow the black dot.So it looks like actually these estimates work quite well.
SUZY MOAT [continued]: Now, you might have noticed this isa figure with data from 2008.And at the time, 2008 and 2009, when they published a big paperin Nature about this, this was seen as a major success story,a way in which we can really use dataon what people are doing online for public good--
SUZY MOAT [continued]: until this happened.So in the winter of 2012-2013, Googleput out an estimate in which theysaid 10% of the US population had the flu.And then when the official data came through,
SUZY MOAT [continued]: it turns out actually only 6% of the US population had the flu.So there was huge outcry.You know, Google has got it all wrong.You know, this big-data malarkey, it's all rubbish.We knew we couldn't trust it.If we need to measure important quantities,we need to do this in the traditional way.
SUZY MOAT [continued]: And people would say this shows that it just doesn't work.Now, what happened?Well, one of the--one of the-- one of the theories as to why this estimate wasso far out was that in that winter,there was a lot of coverage of the flu.So instead of searching for information about the flu
SUZY MOAT [continued]: because people had the flu, peoplewere potentially searching for information about the flubecause they were worried about getting the flu.So this estimate flying through the roof.So my colleague Tobias Pries and I,we were looking at this data, looking at these reports
SUZY MOAT [continued]: and wondering, well, you know, is that it?We've done so much work in the past with Google data.Is it really the case that this just doesn't work?Or could we possibly work with the data in a slightlydifferent fashion?Could we change the way that these estimates are generated
SUZY MOAT [continued]: to address this problem?Now, if you dig into the way in which these estimates wereproduced, you find a couple of things.So firstly, the estimates were produced purelyon the basis of what people have been searching for online.
SUZY MOAT [continued]: It didn't use the slow data on how many people had the fluone or two weeks ago at all.Now, that slow official data, it might be slow,but if you think about it, the number of peoplewho have the flu one or two weeks ago,it's probably got something to do with the number of people
SUZY MOAT [continued]: who have the flu right now.And so it seems that we should be able to take that slow dataand build that into the model as well.So we have a model that generates estimateson the basis of both online data and the slower official data.
SUZY MOAT [continued]: Secondly, we realized, OK, actually this analysis of whichterms go up when flu counts go up,which terms go down when flu counts go down,that was done quite a while ago.And the estimates had been generatedassuming that essentially behavior had stayed the same.
SUZY MOAT [continued]: So the relationship between the numberof people who had the flu and the number of people searchingfor given terms had stayed the same.Now, this potentially seems unlikely.We know humans change their behavior.This is something you have to dealwith when you're interested in studying human behavior.
SUZY MOAT [continued]: And so we realized that rather than training the model rightat the beginning, we could, again,start making better use of that slow official dataas it came in.So every week, update our model on the basisof the most recent set of data.
SUZY MOAT [continued]: So look recently what has the relationship beenbetween the number of people searchingfor these terms and the number of peoplewho actually have the flu.And it turns out, if you make those two changes,you can actually use data from Google to generatemore accurate, rapid estimates of how many people
SUZY MOAT [continued]: have the flu.So in comparison to a model that just uses the slow, old data,we see that we can reduce errors in estimatesby between 16% and 53%.Why the range?Well, it's because you can make a choice about how manyof the most recent weeks used to train your model.So we tried lots and lots of different lengths.
SUZY MOAT [continued]: You find that it always improves the estimates,but the amount by which it improves it changes.So our conclusion was that to completely abandon this datasource as a useful indication of somethingas important as how many people have the flu right
SUZY MOAT [continued]: now would be to throw the baby out with the bathwater.But it does underline that there are some key challenges we haveto take into account when workingwith online data, such as people's pesky tendenciesto change their behavior.And it's another case of something
SUZY MOAT [continued]: which needs to be addressed if you reallywant to get value and reliability outof these new sources of data.So to summarize, I've shown you examples of how we can use dataon what people search for, what people upload as photographs,
SUZY MOAT [continued]: and games people play online to helpus get quicker estimates of crowd size,how people move around, how healthy people are, or indeedinfluences on their health.And those are just a few examples of resultsfrom our data science lab at Warwick Business School
SUZY MOAT [continued]: that suggest that online data might help us measure behaviorthat was previously too expensive, too time-consuming,or impossible to capture.Thank you very much.[APPLAUSE][MUSIC PLAYING]
Publisher: SAGE Publications Ltd
Publication Year: 2019
Methods: Social media research
Keywords: aesthetic appreciation; cell phones; crowds; data analysis; environment awareness; errors; football; green area; happiness; happiness and health; health surveys; internet data collection; internet users; photography; prediction; rural areas; Software; text messaging; travel; Twitter; urban; urban areas; web sites ... Show More
Segment Num.: 1
Associate Professor of Behavioral Science at Warwick Business School and a fellow of the Alan Turing Institute, Suzy Moat, PhD., discusses research how knowing what people search the internet for, what people upload as photographs, and what games people play online can help get quicker estimates of crowd size, of how people move around, of how healthy people feel, and even influences on their health.
Looks like you do not have access to this content.
Associate Professor of Behavioral Science at Warwick Business School and a fellow of the Alan Turing Institute, Suzy Moat, PhD., discusses research how knowing what people search the internet for, what people upload as photographs, and what games people play online can help get quicker estimates of crowd size, of how people move around, of how healthy people feel, and even influences on their health.