Skip to main content
SAGE
Search form
  • 00:00

    [MUSIC PLAYING][Chris Wiggins Discusses Teaching Data Scienceto Both Technologists & Humanists]

  • 00:10

    CHRIS WIGGINS: I'm Chris Wiggins.[Chris Wiggins][Professor of Applied Mathematics ColumbiaUniversity]I split my time between Columbia University and The New YorkTimes.So at Columbia University, I'm a member of the Data ScienceInstitute and a professor of applied mathematics.[Chief Data Scientist][The New York Times]And at The New York Times, I'm Chief Data Scientist.Here at Social Sci FOO, I'm goingto be leading a discussion of a new classthat I co-developed with a history professor, Matt Jones

  • 00:31

    CHRIS WIGGINS [continued]: from Columbia.And the class is called Data Past, Present, and Future.And we think about it as the thingsthat everyone needs to know about dataand data-empowered algorithms.Partly what we're thinking is that there's reallya set of topics that are outside statisticsand outside sort of the soft science context of technology

  • 00:58

    CHRIS WIGGINS [continued]: that we think are going to be interesting to people,regardless of their futures.Future statisticians, future CEOs, future senators.There are some things that we think everyoneshould know about data.We think of the class as materialthat should be of interest to peoplefrom a wide variety of backgroundsand a wide variety of possible futures.And we also think of the material as materialthat's not at present being taught

  • 01:20

    CHRIS WIGGINS [continued]: either to future humanists or future technologists.And as far as the background of the class,I think in part because I've been workingin machine learning for a while and Iwent to go see a talk by Matt Jones, my co-instructor,several years ago about the history of machine learning,which is a subject that I haven't

  • 01:42

    CHRIS WIGGINS [continued]: seen a lot of academic work on machine learning.And it was clear that he and I had complementing literaturesthat we were drawing on, and a lotof overlap in terms of what topics we thoughtwere really important.Also, Columbia developed a new programcalled the Collaboratory, which is

  • 02:02

    CHRIS WIGGINS [continued]: a co-initiative between Columbia's entrepreneurshipoffice and Columbia's Data Science Institute,specifically to get professors from different schoolsto teach together.And so when we saw the call, we right away thought,well, this is a great mechanism for usto have an excuse to develop a new curriculum in data and data

  • 02:24

    CHRIS WIGGINS [continued]: ethics, and sort of the context arounddata-empowered algorithms.Our sense is that for the foreseeable future,data-empowered algorithms are reallyshaping everybody's professional and personal and politicalrealities, and there's a lot for everyoneto understand about that.In terms of the intellectual arc of the class,

  • 02:45

    CHRIS WIGGINS [continued]: we start out with some very modern work,trying to sort of set the stakes for the extentto which data-empowered algorithms arevery pervasive in our lives.And then once we've illustrated that, we go way back to reallythe 17th century and even the first timepeople started using the word statistics,and started thinking about data as a toolfor understanding how to run a state, as well

  • 03:07

    CHRIS WIGGINS [continued]: as early financial forays into making sense of the worldquantitatively.And then rapidly, you see the developmentof things that look very modern, includingstatistical modeling and the developmentof mathematical statistics.There's sort of an intellectual point at World War IIwith the birth of computation.

  • 03:29

    CHRIS WIGGINS [continued]: And at that point, there's a real amazing intellectualdivergence between statistics as it'sunderstood as a form of mathematics and statisticsas it's understood as a form of science.And then you really develop data as an empowering forcefor technology.And then it sort of all comes crashing togetherin the last 20 years.

  • 03:49

    CHRIS WIGGINS [continued]: And throughout the class, we try to makesure that people understand not onlywhat are the technological advances,but what are the interests that are driving that, includingwho's funding this work, and also,how do these new capabilities change power?That is, there's always a political dynamicto technology.

  • 04:09

    CHRIS WIGGINS [continued]: Not political in the sense of voting,but political in the sense of changing powerand the dynamics of power.So that's one of the things that wetry to illustrate throughout the class.I think even five years ago, thiswould have been very difficult. But the waywe structure the class leverages highly modern, verypowerful open source tools for statistical software,

  • 04:32

    CHRIS WIGGINS [continued]: and also a framework called the Jupyter Notebook, whichis an open source technology, which is used in industryand allows people without really a coding backgroundto immediately engage with data and code.I said at the beginning of the class,it's like having code without the coding.So by giving students a pre-written notebook,

  • 04:53

    CHRIS WIGGINS [continued]: essentially you can just press Return, and sort of magichappens in front of you.But you see the code in front of you,which means you can go through and modify the code.You can break the code.And it's very instructive for studentswho are not coming from a coding backgroundimmediately to engage with computation, and specifically,making sense of data through computation.Data's really changing the way almost every field understands

  • 05:14

    CHRIS WIGGINS [continued]: itself.Not only academic fields, but certainly industrial fields.But I wouldn't frame it so much as people from the humanitiesneed to understand data, but rather, all of usneed to understand the context around data.So it's not that we've written a classthat we want to introduce humanists into technologyor that we want to introduce technologists

  • 05:35

    CHRIS WIGGINS [continued]: to humanistic thinking.It's that we really believe there'svery important material that's notbeing taught to either group.And we think that all of these groups--future senators, future statisticians, future CEOs--need to understand the way we got here as a species.How did we get to the point wheredata and data-empowered algorithms

  • 05:58

    CHRIS WIGGINS [continued]: are shaping all of our realities?And that's really what we're trying to do.So I wouldn't frame it so much as getting the humaniststo code or getting the technologists to understanda little bit about ethics.We really feel like there's a set of thingsthat are important to know, and are currently not being taughtto any of these students.One thing to know about the classis the way we tried to structure the cadence of every week,

  • 06:19

    CHRIS WIGGINS [continued]: that on Monday, we'll read some original textor maybe some secondary text that analyzessome statistical innovation.And then on Wednesday, we'll execute that in Jupyter.So we're doing this using Python.We're using the same tools that peopleare using in industry or in their research.But because open source tools have developed so stronglyin the last decade, it's become possible

  • 06:41

    CHRIS WIGGINS [continued]: for students with no computational backgroundto really dig into code immediately.So for example, to answer your earlier question,what data sets do we look at, in the first week of the class,we invited students to look at a standard machine learningrepository of data sets from University of CaliforniaIrvine-- it's a well-studied corpus of data sets--and to find data sets that they found interesting.

  • 07:02

    CHRIS WIGGINS [continued]: For example, data sets where theysaw some subjectivity in what the researchers who collectedthe data chose to collect or to throw away,or how they chose to define outliers.And then the next week, we took those datasets and started digging in, particularlydata sets about people.How do you ingest the data?How do you render the data?How do you visually explore the data?

  • 07:25

    CHRIS WIGGINS [continued]: And now as the class develops, we'llparallel the development of machine learning and statisticsby introducing them every week.And they will actually engage with those techniquesas they read about them.One of the repeated themes of the classis that when we say, oh, we want everybody to understand data,that that can be parsed in a slightly more nuanced way.It's not necessarily the case that we want everybody

  • 07:47

    CHRIS WIGGINS [continued]: to know how to code or to become a machine learning developer.But we do think that everybody shouldhave some balance between functionalcapability and the ability to analyze code--analyze data with code, critical literacy,and critical capability, the abilityto interrogate when somebody says,oh, the algorithm said this is true.We'd like everyone to have critical facilities

  • 08:08

    CHRIS WIGGINS [continued]: to interrogate that claim.And also some rhetorical capability,meaning the ability to interpret and to argue for some positionarmed with data.So even in the beginning weeks of the classas we just start exploratory data analysisand start rendering the data visually,part of what the students are doingis arguing for some view of what they see in the data.

  • 08:31

    CHRIS WIGGINS [continued]: So those sort of multiple capabilities-- the functional,the critical, the rhetorical-- isone of the repeated themes of the class.In the arc of the class, we try to always pairsome intellectual innovation with some sortof political observation.That is, who benefits or who was funding the innovation?Or how does that innovation change the dynamics of power?

  • 08:53

    CHRIS WIGGINS [continued]: More broadly, consistently through the class,we try to introduce students to howto think about the ethics of this changein the way we make sense of data.One of the things we try to emphasizeis ethics is a bit of a fluid and poetic term,but nonetheless, we can apply some real analysisto what people mean by ethics in terms of the rules

  • 09:16

    CHRIS WIGGINS [continued]: that individual groups demand, the standards from whichthose rules are derived, and the high-level principlesof which those standards are special cases.So we try also to have the students understandthroughout the class, how did we as researchersdevelop ethics in the way we now understand it?

  • 09:38

    CHRIS WIGGINS [continued]: And the extent to which that does or does not carry overinto private companies, which are also shaping the way peopleunderstand their world through data.We haven't done the class in a sortof capstone-like or project-like way.Again, every week, we do open up Jupyterand engage with data directly.But there's not a capstone or research component.We want the class to be accessible to students

  • 09:59

    CHRIS WIGGINS [continued]: with zero prerequisites.So some of the students are history graduate students.Some of the students are sophomoresin technological or humanistic fields.Some are undeclared.So we try not to make the class something where there'ssort of a scholarly bar.Like, you must be ready to do independent research,or something like that.

  • 10:20

    CHRIS WIGGINS [continued]: We really do believe that this material shouldbe accessible to everybody, and everyone should take it.[For "Data Past, Present, and Future" resources, syllabus,and lecture notes, visit github.com/data ppf/datappf.github.io.wiki][MUSIC PLAYING]

Abstract

Chris Wiggins, PhD, Professor of Applied Mathematics at Columbia University, discusses the rationale for considering data science an interdisciplinary field with relevance to everyone. Outlined are two university courses based on this approach, and which focus on understanding, analysis, ethical considerations, and making sense of big data for those from a non-coding background.

Looks like you do not have access to this content.

Chris Wiggins Discusses Teaching Data Science to Both Technologists and Humanists

Chris Wiggins, PhD, Professor of Applied Mathematics at Columbia University, discusses the rationale for considering data science an interdisciplinary field with relevance to everyone. Outlined are two university courses based on this approach, and which focus on understanding, analysis, ethical considerations, and making sense of big data for those from a non-coding background.

Copy and paste the following HTML into your website