Skip to main content
Search form
  • 00:05

    [Studying Gerrymandering Using Spatial Data AnalyticsAlgorithms]

  • 00:10

    JULIA KOSCHINSKY: I'm Julia Koschinsky.I'm the executive director of the Center for Spatial DataScience at the University of Chicago,which is headed by Luc Anselin, and I'vebeen with previous versions of the center for over 15 years.

  • 00:25

    JAMIE SAXON: And I'm Jamie Saxon.I was trained as a particle physicist,and I came to the Harris School of Public Policyat the university and then found myselfdoing more and more and more spatial data analysisand have moved over more and more towards the center.

  • 00:37

    JULIA KOSCHINSKY: So what I'll dois quickly set the stage about the context of spatial datascience and then Jamie will talk a little bit about a researchapplication in gerrymandering that will hopefully make thiscome to life a little bit more.And so we were thrilled to see that Sage is interestedin spatial analytic methods because there's--

  • 01:01

    JULIA KOSCHINSKY [continued]: data scientists sometimes think about spatial analysisin terms of putting dots on the map or data visualization,or in terms of GIS tools like geographic information system.And at the center we think about spatial analysis much morebroadly than that, and would encourage new students

  • 01:23

    JULIA KOSCHINSKY [continued]: to come in to think about it more broadly.So we apply a spatial perspectiveto the whole research process, startingfrom defining spatial questions to research design,thinking about spatial sampling, for instance,to using explicitly spatial econometric methods,and then visualizing not just raw data,

  • 01:44

    JULIA KOSCHINSKY [continued]: but the results of spatial analytic tests,for instance, right?And so we use not just a GIS tool,but think of it in a broader framework of GI science,geographic information science.And so then as a spatial data scientistyou would apply and develop scientific methods, concepts,

  • 02:09

    JULIA KOSCHINSKY [continued]: and tools to analyze geographic data in order to gain insightsabout location.And yeah, so that is kind of broadly how wethink about spatial analysis at the center.[How do data science and spatial analytics impact each other?]Right now those two communities are still pretty siloed,

  • 02:31

    JULIA KOSCHINSKY [continued]: given different routes in computer science and physics,on the one hand, and geography, economics, policy and so on,on the spatial analytic side, but there'sgreat interest in more synergies between those two communities.And so we have been collaboratingwith a location intelligence startup called

  • 02:53

    JULIA KOSCHINSKY [continued]: CARTO in Brooklyn, New York, to create a variety of venuesto bring these two communities together more.And interest has been exploding.We first started out with Hangouts,and that got so big that we decided to do a conference.We expected to have 70 people attend the conferenceand got blown away by 140 people who signed up

  • 03:15

    JULIA KOSCHINSKY [continued]: within one month's notice.So we're continuing to offer this conference.The next one will be in October in New York.It's a great way for people to get together and share ideas,projects, collaboration opportunities across these twodisciplines.There's also CARTO started a Slack channel in spatial data

  • 03:37

    JULIA KOSCHINSKY [continued]: science that you can sign up for and a newsletter.So there's plenty of opportunitiesfor people who are new to the field to get involved.[How do you apply computational methods to spatial analytics?]At the center, one of our goals that we'vebeen committed to for the past 20 years

  • 03:58

    JULIA KOSCHINSKY [continued]: is to make spatial analysis methods and tools accessibleto researchers, and for teaching purposes,and for people interested in it beyond,so city agencies and other analysts.And the key component of doing that are two things.

  • 04:20

    JULIA KOSCHINSKY [continued]: One, having software that's available.And two, tutorials on how to use these methods and tools.And we have a commitment to free and open source software.Since we're public, it kind of operateswithin the public domain and have been funded traditionallyby organizations like the National Science

  • 04:40

    JULIA KOSCHINSKY [continued]: Foundation, NIH.And so we're committed to using those public investmentsto continue to make the software available publicly.And so to give you one example that illustrates the demandand interest in free and open source software,we created a program that's a point and click introduction

  • 05:01

    JULIA KOSCHINSKY [continued]: to spatial analysis.And it's used for teaching purposes in spatial analysisintro classes across the US and globally,called GeoDa, that's by now--15 years later-- used by a quarter million people.And there's about 9,000 citationsto publications that use GeoDa in research and so on.

  • 05:22

    JULIA KOSCHINSKY [continued]: And so we expected that to teeter off at some point,but it just has kept growing and keeps going.So that's been exciting.And simultaneously we participate in the Python-basedSpatial Analysis Library for peoplewho are interested in entering this at the code level.

  • 05:43

    JULIA KOSCHINSKY [continued]: And that effort is spearheaded by Sergio Rey at UC Riverside,and we're contributing spatial econometric modeling componentsto that code.And then on our website, there's a lotof tutorials that are for free thatshow you how to use all this.And we have a GeoDa Center YouTube channel

  • 06:05

    JULIA KOSCHINSKY [continued]: with about 150 lectures of Luc Anselin with 200,000 views.And so there's a lot of materialsto get your hands dirty and to try these methodsand learn how to use them in a way thatgoes beyond point clicking.[What are some overlapping concepts in spatialanalytics and computational methods?]

  • 06:26

    JULIA KOSCHINSKY [continued]: One of the groups of methods thatis at a natural intersection between the datascience and spatial analytic communitiesis the detection of local clusters in multivariate space.And so this is--maybe I should say what the text of local clusters means.It means to identify observations

  • 06:47

    JULIA KOSCHINSKY [continued]: that are similar to each other in the same group.And they will be more similar than our observationsin other groups that get identifiedas separate clusters.And so these local cluster detection techniquesand dimension reduction techniquesare very common in unsupervised machine learning.

  • 07:09

    JULIA KOSCHINSKY [continued]: There's things like K-means, hierarchical clustering,and principal component analysis and so on.And in spatial analytics you add a spatial constraintto those methods.And so for instance, if you are detecting local clustersthrough something like K-means, youcan do that now not only in attribute space,

  • 07:30

    JULIA KOSCHINSKY [continued]: but also in location space.So you can identify local spatial clustersthat are similar to each other also in attribute space.And another group of methods thatis part of the spatially constrained cluster methodis called regionalization methods.

  • 07:52

    JULIA KOSCHINSKY [continued]: And those allow you to group spatially contiguous areas.And this is an area that Jamie willtalk about with a research applicationin more detail in a minute.And just to mention that we've implemented theseand other local cluster methods in GeoDa,and they're also available in free and open source spatial

  • 08:12

    JULIA KOSCHINSKY [continued]: packages of programs, like PySAL and other programs.[How would you define gerrymandering?]

  • 08:22

    JAMIE SAXON: Gerrymandering is just effectively abusingthe redistricting process.So every 10 years the districts of political or the boundariesof political areas get redrawn after the census.And in the US, politicians can basically control that process.And so the state legislatures who control it

  • 08:44

    JAMIE SAXON [continued]: will draw lines that benefit the party that'sthe majority at that time.And by doing that, they get more of their own representativesor that party's representatives and they send them to Congressor in their own chamber.And so effectively, it's the abuse of that district in powerto change the representation.

  • 09:04

    JAMIE SAXON [continued]: The regionalization is just dividing up a spaceinto a number of-- partitioning it into a number of areas.And in the case of a districting problem what we're doingis dividing up space, and we alsohave these additional constraints of equal populationand then contiguous spaces.And so it is just the regionalization problem

  • 09:25

    JAMIE SAXON [continued]: with these additional constraints of equal populationand contiguity, which just means that all of the spacesare connected to each other.By creating a regionalization algorithm to do this process,you're effectively doing a districting processwithout putting in partisan information.And so by creating a population of maps with the outcomes

  • 09:49

    JAMIE SAXON [continued]: that you would have in this sort of counterfactual worldwhere politicians weren't controlling the process,you can create the baseline of what representation wouldlook like if it didn't have this political distortionin that space.[Why was it important to use multiple clustering methodsfor this project?]There's lots of different ways of creating a regionalization

  • 10:10

    JAMIE SAXON [continued]: problem or responding to the regionalization problem.And so if you write one regionalization methodor another, you'll get potentially different shapesof clusters.So for example, one method might try to get all of the peopleas close together as possible, and another onemight try to get the perimeter of the districts

  • 10:32

    JAMIE SAXON [continued]: as low as possible with respect to the areas of thosedistricts.And if you're trying to say that by distorting the process,that politicians have distorted the processand changed the representation with respect to some baseline,you need to know that that baseline won't change when you

  • 10:53

    JAMIE SAXON [continued]: change the objective function.So in other words, does trying to get peopleclose to each other end up with consistent political resultsto trying to have the perimeter as low as possible?One of the interesting points from my workis that these different methods actually leadto fairly consistent outcomes.

  • 11:13

    JAMIE SAXON [continued]: And that's something of a surprise becausein the US democratic voters, liberal voterstends to be clustered in cities, so in denser areas.And these different objectives, people closer together or lowerperimeters, actually will divide up dense populationsin different ways.And so you might think that by dividing Democrats,

  • 11:34

    JAMIE SAXON [continued]: more or less, you would affect how many seats they would get,and that turns out not to be true.[How do you obtain the data for your clustering tests?]When I actually calculate who is goingto win what I'm really doing is taking the actual votescast in presidential elections at the precinct level, so

  • 11:55

    JAMIE SAXON [continued]: a really small scale, and then I run this regionalizationmethod.I have the simulated districts across the state,and I take all of those actual votesand re-aggregate them within the boundariesof the simulated districts.And then it's just who wins that district.[Why is there such a focus on compactness in redistricting?]

  • 12:17

    JAMIE SAXON [continued]: All of the regionalization algorithmsthat I've worked on have tried to makethese districts as compact and close together as possible.And the reason that I've done thatis because there's a long legal history for this conceptof compactness.The Supreme Court has required itfor both racial and partisan gerrymandering.

  • 12:38

    JAMIE SAXON [continued]: More than half of the states in the USrequire it for a congressional district.And even, in fact, Congress when theyused to control this process, required it.So they required equal population, contiguity,and compactness.And so what I've done is basicallygone back through this legal literatureand tried to pull out this concept that peoplehad referred to over and over again,

  • 13:00

    JAMIE SAXON [continued]: but had never really computationally been compared,to see if one version of compactnessis actually consistent with another.The Supreme Court took over responsibilityeffectively for the districting process in the early 1960sand then for the partisan gerrymandering problem,in particular, in the '80s.

  • 13:21

    JAMIE SAXON [continued]: And they're sort of not well-suited to managingthis problem.The Constitution actually gives Congresspower over the districting process,and for about 70 years, from 1842 until 1929,Congress used that authority pretty effectively.

  • 13:42

    JAMIE SAXON [continued]: And so what I would love to see isthat Congress would start doing that againand step back into this space.As unlikely as it might seem, as little faithas we might have in Congress, theyhave a better track record for doing this,and they have more power and more leewayto come up with an appropriate solution.[What is the data collection and analysis process for this

  • 14:05

    JAMIE SAXON [continued]: project?]The way this works is we have a collectionof census tracts, which are just small populations of about3,000 people.And you take the whole state and youstart trading back and forth census tracts,trying to get as close to your objective as you can,get all of the people as close together as possible,or get the perimeter with respect to the area as lowas possible.

  • 14:27

    JAMIE SAXON [continued]: And you're just trading these back and forth.Well, there's a couple of problems.For one, if you're evaluating how close are the peopleor how low is the perimeter, you needto be able to do that incredibly quickly because you'reconsidering every possible changeto the current configuration.And so there's a lot of work thatgoes into writing that code extremely efficiently.

  • 14:50

    JAMIE SAXON [continued]: The other problem is that this is an enormous problem,and it's technically called an NP-hard problem, whichjust means that computationally wecould run all the computers in the worldfrom the beginning of the universe until now,and we'd never basically considerevery possible configuration.And so you need ways of trying to just heuristically search

  • 15:12

    JAMIE SAXON [continued]: towards better solutions.But you can also end up in what'scalled a local minimum, which means it looks betterthan everything that's around it,but it's not actually that good a solution,and you need to get out of these situations.So these heuristic methods take a lot of development.Of course, what I ultimately end up doingis I do my best that I can there,but then I throw a ton of computational-- just resources

  • 15:34

    JAMIE SAXON [continued]: at this problem.So this is all ultimately done on the grid.Yeah, this is part of the philosophy of the center.So as soon as this paper is finished, then yeah,the software will be incorporated into the existingframeworks.[Where did the data come from?]So this is one of the points that I thinkis really exciting about now--

  • 15:56

    JAMIE SAXON [continued]: the quality of the spatial data that is just publicly, freelyavailable is really fantastic.For this specific project, I'm pulling datafrom the US Census Bureau.And the quality of just the maps that youcan create with that data is really phenomenal.One of the other publicly available sourcesthat any student can check out, that I'm not

  • 16:17

    JAMIE SAXON [continued]: using in this project in particular,is open stream maps.The other part of the data for this projectare those presidential elections, the precinct levelreturns.And when I started out on this project, I would have thought,well, it's an election.It has to be a free and open election,so I must be able to find this data easily.And it turns out that states don't take that too seriously.

  • 16:41

    JAMIE SAXON [continued]: So it can be extremely difficult to piece back togetherwho voted in each precinct and how they cast their votes.So I actually made a resource that's an interactive map,and you can actually explore all the different types of mapsthat you can get out of each of the individual regionalization

  • 17:03

    JAMIE SAXON [continued]: methods.So here, I went through about 18 different methods,and you can compare that to the existing maps.And you can look at it both in termsof the partisan composition of the simulated and realdistricts and the racial and ethnic compositionof those districts.And so you can see all of those resources, and then, of course,

  • 17:26

    JAMIE SAXON [continued]: also a description of the methods in a little bitmore detail.[What advice would you give to someone new to computationalsocial science?]Learn C++.I would start by, I mean, I think one of thingsthat's interesting coming to this conference,or at least that I have found interesting,is there's sort of let's do somewhat more complex

  • 17:47

    JAMIE SAXON [continued]: statistical techniques.There is, we have natural language processing.There is, we have network theory,but I think one of the things that Ifeel like I haven't seen as much of at this conference is just,I'm going to do a big computational task at reallylarge scale on the grid.And I think the possibility of trying

  • 18:08

    JAMIE SAXON [continued]: to not just study, but come up with solutionsto these big social science problemsreally demands that you be able to write large scalecode that you can deploy on the grid.And I think that remains rare.And I think that's one of the points where this field really

  • 18:30

    JAMIE SAXON [continued]: will develop over the next decade.

  • 18:33

    JULIA KOSCHINSKY: My tip is supplementary to thisand it would be to learn how to scopea good and relevant question.And especially in a field that is excitedand overwhelmed by the ubiquity of new dataand all these cool new methods that are coming out,

  • 18:54

    JULIA KOSCHINSKY [continued]: there is a risk that the questions are drivenby the data and the methods.And we're missing opportunities to ask broader questionsthat we could bring all this horsepower to.And so if you--and it's a call for maybe collaboration between physics

  • 19:16

    JULIA KOSCHINSKY [continued]: and substantive fields, for instance,because the substantive fields oftenhave the questions, but not the computational powerand methodologic expertise and vise versa.The technologists and methodologists sometimeswould like to connect with other people whohave real-world questions.

  • 19:45

    JULIA KOSCHINSKY [continued]: nbsp;


University of Chicago's Julia Koschinsky, PhD, Executive Director of the Center for Spatial Data Science, and Jamie Saxon, PhD, post doctoral fellow, discuss their research on gerrymandering using spatial data analytics and algorithms, including research applications of spatial data science, relationship between spatial analytics and data science, applying computational methods to spatial analysis, gerrymandering, clustering methodology, data sources, data collection and analysis, advice to a novice in computational social science.

Looks like you do not have access to this content.

Studying Gerrymandering Using Spatial Data Analytics & Algorithms

University of Chicago's Julia Koschinsky, PhD, Executive Director of the Center for Spatial Data Science, and Jamie Saxon, PhD, post doctoral fellow, discuss their research on gerrymandering using spatial data analytics and algorithms, including research applications of spatial data science, relationship between spatial analytics and data science, applying computational methods to spatial analysis, gerrymandering, clustering methodology, data sources, data collection and analysis, advice to a novice in computational social science.

Copy and paste the following HTML into your website