Skip to main content
SAGE
Search form
PDF
  • 00:00

    [MUSIC PLAYING][Using Spatial Data Science to Improve Food TruckSales--CARTO]

  • 00:17

    STUART LYNN: CARTO is a company thatproduces tools in a platform for really helping peopleunderstand and make better businessdecisions through geospatial dataand geospatial information.And so for us, that kind of geospatial context,that context of where you are, reallyimpacts how anything from trades to economic activity to transit

  • 00:39

    STUART LYNN [continued]: really has to be thought about and decisionsto be made with that.

  • 00:41

    DONGJIE FAN: The first step is to findwhere those hotspots are.

  • 00:46

    WENFEI XU: Now, I think, slowly, whatwe're doing is we're kind of evolving from just beinga platform to doing a lot more.More and more, we're building out these tools.And what our company is focusing on right nowis we're focusing on different kinds of specific web

  • 01:06

    WENFEI XU [continued]: applications or solutions that wecan develop for logistics optimization, site selection,and territory management.

  • 01:15

    ANDY ESCHBACHER: My official title is map scientist.But really, it's data scientist with spatial data.We get quite a range of problems that we need to solve.A lot of them are very core data science ones.Almost always, they have spatial data as part of that.And so we need to turn that spatial datainto a form that can be used with more traditional datascience methods.

  • 01:36

    ANDY ESCHBACHER [continued]: Visualization's a very important component to all the workthat we do here.So we need to display that data on a map,oftentimes, in a company with different chartsand things like that.And besides that, we need to productionize a lot of our datascience stuff into CARTO's platformor into some other forums like Jupyter Notebooks

  • 01:56

    ANDY ESCHBACHER [continued]: so that it's digestible to our clients or other people whowould be using that work.

  • 02:01

    WENFEI XU: Can we zoom in a little bit?

  • 02:03

    DONGJIE FAN: Which?

  • 02:05

    WENFEI XU: Let's go to the yellow.So we tend to do a wide range of different kinds of projects,what I think is actually one of the great things about workingat CARTO is you just get so much exposureto different industries, from retail to ambulances to foodtrucks and parks departments.

  • 02:29

    WENFEI XU [continued]: And we're increasingly finding that there's justa lot more interest in spatial data science.That actually looks like it makes a lot more senseand like you actually have some areas thatlook kind of meaningful, like that's Dumbo.

  • 02:41

    STUART LYNN: There's a particular project, the foodtruck project, that we worked upon last year.This was a project that came to usfrom a partner company, a companythat our CEO knows pretty well.They're a company that, basically,are trying to modernize the food truck industry by workingwith clients and restaurants that are maybe a little bit

  • 03:03

    STUART LYNN [continued]: more upmarket than the typical hot dog vendorsthat you find in New York.So they were really interested in how location affectstheir sales and how it affects the performance of their foodtrucks.And this is a really interesting casebecause, unlike a lot of retail-type use cases,with food trucks, you basically can move your food truck around

  • 03:23

    STUART LYNN [continued]: from week to week.So you have an ability to try out different locations,get a lot of information about how theyperform in different locations.We were looking, actually, for a projectto demonstrate the power of location data for retail.And they were looking to try and solvethis problem of where to place their food trucks.And so it was a really nice match for usto be able to show off how these kind of data

  • 03:44

    STUART LYNN [continued]: streams that we're producing and this localized geospatial datacan really help impact predictions of retailfor companies.And I think that was exciting to usbecause it was going to give us the first chanceto really prove that to an external audience.And then, we assigned it to Wenfeion the team to help scope out.

  • 04:00

    WENFEI XU: So we got all this food truck data,and Javi wants us to do a case study for locations conference.And I think that the basic idea, the basic questionthat he wants us to answer, is a site selection question.

  • 04:16

    STUART LYNN: The first session for us, really,was sitting down with the food truck companyand really talking to them about their dataand seeing how their data was structured,how much information they could give to us.Obviously, because this is sensitive business information,it's not necessarily they're goingto give us their entire transactionrecords for every food truck, so really seeingwhat the limitations in their data was.

  • 04:37

    STUART LYNN [continued]: And then really, evolve a standard project flowwhere we assess the data, we understandwhat other data sets we need to bring in to augment that datato basically fill the project.And in this case, what we looked atwas data about what other businesses werein the area of the food trucks, data about how muchpeople were spending in different categories

  • 04:59

    STUART LYNN [continued]: through credit card transactions within the area.

  • 05:01

    WENFEI XU: I think the food trucks are around--and this is Central Park here--I think the food trucks are around these areas--some Penn Station, some around Grand Central,a couple in Lower Manhattan, Union Square,and then Dumbo here and a couple of other different places.And so I think we have four of these particular food truck

  • 05:24

    WENFEI XU [continued]: locations.

  • 05:25

    STUART LYNN: And then also, a dataset that we'd been working on not longbefore this project, which is looking at human mobility,so using traces of GPS to see how peopleare moving through a city.And so we wanted to bring those three data sets togetherto get context for the food trucksand use that to really model their success or failureand how much of a success and failurewe'd actually see with each one of those food trucks.

  • 05:46

    STUART LYNN [continued]: Privacy issues are a huge problem in this space.And I think we're all very much aware of that.We take privacy very seriously.Everybody on the team is very much an advocate of privacy.And we want to make sure that we're notinvading anybody's privacy.So for us, there's a very strong incentive for us

  • 06:07

    STUART LYNN [continued]: to work either with data that's already been anonymizedby the third-party data vendors that we work with so that wehave no way whatsoever of ever tracingthis back to an individual, or when we get the data,if there is some potential for us to potentially identifysomebody, making sure that, A, we neverdo that, and B, that we keep this data really

  • 06:29

    STUART LYNN [continued]: safe and secure so that it's never going to be leaked,and then, C, making sure that whatwe do in the first instances is reallymove the data from that individual leveldata to aggregate data.When people work in the mobile ad industry,they work a lot with personal level data, individual leveldata.In the geospatial world, we don't really care about that.What we care more about is did this street have a larger

  • 06:52

    STUART LYNN [continued]: flow of people going through it than this other streetover here?And so for us, one of the first steps that we do with any datais that anonymization and aggregation step.But it is hard.And it's like you have to always be wary about what you're doingand, really, at every single step, be worrying aboutwhether or not you're oversteppingthe bounds of the data.

  • 07:10

    WENFEI XU: So I think what I'm seeing here,basically, is a lot of, again, the ambient noise.

  • 07:14

    STUART LYNN: Yeah.A lot of it's at the edges, right?The data that we work with is not necessarily unstructured,but it is very noisy.So we normally have a very fixed structure for the datathat we work with.So for example, when we're looking at human mobility data,we're seeing these GPS traces for people,and we're using those to try and construct an idea of traffic.It will normally have a very well-defined structure.

  • 07:37

    STUART LYNN [continued]: So we'll have a latitude, a longitude, a timestamp,some measure of accuracy, and then, potentially,an anonymized ID for the person we're looking at.So the structure is there.But what might happen is we may have data that's missing.We may have data that's very noisy or very intermittent.And so in that way, it's not structured

  • 07:58

    STUART LYNN [continued]: in terms of the period of collectionor the fullness of collection.So in some areas, we have more people.In some areas, we have fewer people.Some people we see all the time.Some people will come in and out.And all of those factors lead to strugglingto use that to get a really good, solid estimateof the number of people in an area.So one of the biggest challenges we had with this project

  • 08:20

    STUART LYNN [continued]: was really understanding how to use the mobility datain a way that was going to give us insightbut was also very much aware and taking into account the biasesand the error within that data.And we kind of hunkered down and then createda plan of attack for how we would take the data,join it to these food trucks--what was the appropriate way to do that join?how large of an area around a food truck

  • 08:41

    STUART LYNN [continued]: should we consider in this data to be accurate?--and then created a data set, whichwas the combination of the food truckdata plus this external data.At that point, you're good to go for modeling.So we then iterated on producing a modelto project the performance of each food truckusing those other features as predictors.

  • 09:04

    STUART LYNN [continued]: So that was a process of just seeingwhat features were important, how we brought them in,and how that improved the model.And so as we added, then, data from the local area businessesand the spend data and the mobility data,we saw our model get better and better, which gave usconfidence that what we were seeingwas more and more predictive power in the model.

  • 09:24

    ANDY ESCHBACHER: Well, I think the nice part about thisis it's really extensible.Because all we need to do is define new tasks in here.So it's like a new endpoint.Key skills that I use day-to-day in solvingproblems is programming is a very important part of it.Python is my language of choice for solving problems.

  • 09:45

    ANDY ESCHBACHER [continued]: Oftentimes, we need to rely on putting togetherother pieces, as well.So we typically use Docker as a wayto have an isolated environment for solving those problems.Jupyter Notebooks are an important partof our workflow for showing reproducible workflows

  • 10:06

    ANDY ESCHBACHER [continued]: and communicating results.And we use SQL quite a bit and spatial SQL, which is PostGIS.

  • 10:13

    STUART LYNN: And the final part of all thiswas, then, constructing a lot of visualizations and plotsthat help communicate that both to the food truck clientand to a wider audience to showcase how the process workedand how this data can really impactthese kind of decision-making processes.So it's kind of a long process.It sounds very linear as I described it here,

  • 10:33

    STUART LYNN [continued]: but it's not.We go back and forth between the steps a lot.There's a lot of trial and error.There's a lot of just realizing that you accidentallychopped the end off of the day and havingto go back and redo that part of the daily analysis.Once we were at the point where wewere happy with the predictive power of the model,we were able to take it and then useit to predict what the sales wouldbe in a lot of different areas in New York

  • 10:54

    STUART LYNN [continued]: and were able to identify, then, the top 10 areasthat you would probably want to put a food truck to geta good return on investment.And at that point, there's a phaseof the project where we really try and validate thatwith intuition.So we all live in New York.We all work here.We all hang out in New York.And so we've got a fairly good idea of the city

  • 11:14

    STUART LYNN [continued]: and what different areas are good and trendyand have a lot of traffic.So looking at these areas, we justdid a very quick mental validationthat what we were seeing aligned with what we thoughtwas reasonable, that there wasn't any areas thatwere being highly predicted for revenuein the middle of nowhere, where therewas no people, for example.And that kind of gave us confidence in the model.

  • 11:35

    WENFEI XU: And so, basically, what we're going to dois we're going to look at where sales data is coming fromand then look at isochrones of those.Basically, I think I'm going to look at isochrones wherethe highest sales are and then find the overlap.

  • 11:50

    ANDY ESCHBACHER: In the food truck project,CARTOframes was useful for getting data outof CARTO into a Jupyter Notebook for running and analysis,visualizing the results, spreading it back up to CARTO,and then using those results to create derivative productslike maps and other charts.So we're just making sure that all of the piecesplay well together so that users can

  • 12:11

    ANDY ESCHBACHER [continued]: input all of the information that they need and geta result quickly.And then, we're going to start adding more nuancesso that we can show multiple results,depending on what people are interested in.

  • 12:23

    WENFEI XU: I think they're all the same type of food truck.So I think that, basically--I'm actually not even sure they move.I think the fact that they don't move basicallycollapses our data points to, effectively, 12or however many food trucks we have.

  • 12:39

    STUART LYNN: So we have a lot of policiesaround how we store data.In the clouds, we use two-factor authenticationacross all of our devices to access that.We have security audits internallyat the company to make sure that we're adhering to bestpractices with this data.And honestly, a lot of the time, itmeans just deleting data as soon as we're done with it.

  • 12:59

    STUART LYNN [continued]: So if we get a dump of data from a company,and it's closer to being personally identifiable than wewould be comfortable with, we will produce aggregations offof that.We'll keep the data around for the lifetime of the projectto make sure that we can go back and correctany mistakes that we've made but then, as soon as it's done,making sure that we retire that data in a way that's secure.So I think we try and adhere to the best practices

  • 13:21

    STUART LYNN [continued]: and make sure that we're very aware at every step of whatwe're doing, what the potential implications for privacycan be.I think we often forget in businessthat it's partly about money but also partly about givingpeople what they want.And by using this kind of data, this kind of analysis,we could identify populations of peoplewho are going to be more interested in using these foodtrucks than others.That means that there's going to be people out there who

  • 13:41

    STUART LYNN [continued]: are just going through their day and happento find good foods more often than they would before, whichI think is a great outcome.For me, though, the outcome was not entirely straightforward.And I think this is part of my own philosophy about datascience.Really, for me, the power of data scienceis not in replacing people's decision-making abilitiesbut augmenting them, giving them more information,

  • 14:02

    STUART LYNN [continued]: giving them more insight into the problem in such a waythat they can use their main expertise and their yearsof intuition and experience to make the best decisions.So I don't expect the food truck companiesthat we work with here to take those predictionsand just blindly go put their food trucks in those places.But they can be a really useful toolin understanding or thinking about places they might nothave thought of before and beginning

  • 14:23

    STUART LYNN [continued]: to understand more about the context of those food trucks.Data science is all about finding patterns and datathat are too subtle or too complicatedfor us to see as humans.So one of the outcomes of this projectwas that we were able to identifythe factors that were leading to successful food trucks.And again, that's something that's important.We weren't just making a prediction.We were making a statement of value

  • 14:45

    STUART LYNN [continued]: and a statement of what you shouldbe looking for in different areasto gain insight and get success.[MUSIC PLAYING]I think, in this case, we were able to find patternsin these huge data sets about mobility and spend

  • 15:05

    STUART LYNN [continued]: and businesses that would have been really hard for a humanto come up with.But that will be combined with people's experienceand intuition to drive better insightsand to drive better outcomes.[MUSIC PLAYING]

Abstract

Stuart Lynn, Head of Data Science, Wenfei Xu, Data Scientist, Andy Eschbacher, Map Scientist, and Dongjie Fan, of CARTO, discuss recent spatial data science projects, specifically the food truck sales project, including accessing and collection of data, challenges using the data, modelling the data, programming languages used, process and verification of the model, data storage and security, outcomes of the project, and the importance of data science in improving outcomes.

Video Info

Publication Info

Publisher:
SAGE Publications Ltd
Publication Year:
2019
Product:
SAGE Research Methods Video: Data Science, Big Data Analytics, and Digital Methods
Publication Place:
London, United Kingdom
SAGE Original Production Type:
SAGE In Practice
ISBN:
9781526493231
DOI
https://dx.doi.org/10.4135/9781526493231
Copyright Statement:
(c) SAGE Publications Ltd., 2019

People

Speaker:
Stuart Lynn
Speaker:
Dongjie Fan
Speaker:
Wenfei Xu
Speaker:
Andy Eschbacher

Segment Info

Title:

Segment Num: 1

Keywords:

Segment Start Time:

Segment End Time:

People

Things Discussed

Organizations Discussed:

Events Discussed:

Places Discussed:

Persons Discussed:

Methods Map

Spatial analysis

The exploration of physical and spatial features of a particular environment which are associated with certain social outcomes, such as social exclusion or crime, and are therefore seen as having a contributory or perhaps causal effect on social processes or social phenomena.
Spatial analysis
Using Spatial Data Science to Improve Food Truck Sales: CARTO

Stuart Lynn, Head of Data Science, Wenfei Xu, Data Scientist, Andy Eschbacher, Map Scientist, and Dongjie Fan, of CARTO, discuss recent spatial data science projects, specifically the food truck sales project, including accessing and collection of data, challenges using the data, modelling the data, programming languages used, process and verification of the model, data storage and security, outcomes of the project, and the importance of data science in improving outcomes.

Copy and paste the following HTML into your website