Skip to main content
SAGE
Search form
  • 00:09

    CLAUDIA VON VACANO: I'm Claudia von Vacano.And I'm the Executive Director of the Social Sciences D-Laband Digital Humanities at UC Berkeley.One of the things that I bring to that roleis a really diverse background.I came to the United States as a political refugeewhen I was eight years old.So I had to learn English from scratch

  • 00:30

    CLAUDIA VON VACANO [continued]: from television, in fact.I also am queer identified.And so all of these different identitiesreally helped me create a very inclusive space at the D-Lab.Yeah, so I just want to say a little bit about the SocialSciences D-Lab, which is the largest onboarding data scienceorganization on campus at UC Berkeley.

  • 00:51

    CLAUDIA VON VACANO [continued]: We serve about 6,000 scholars per year.We provide 1,100 consultations a year.And we have about 300 workshops, working groups,and research teams that are active at all times.So we have really a vast amount of resources,highly interdisciplinary.

  • 01:12

    CLAUDIA VON VACANO [continued]: Although we have social sciences in our name, interestingly,our largest scholar participant-baseis from public health.And our fastest growing group is from engineering.So we have just as many people comingin who are working with social dataand want to understand what social scientists do,how, you know, we deploy different methods

  • 01:32

    CLAUDIA VON VACANO [continued]: for understanding social data as we have people who come into learn R, Python, and other methodologies that canbe deployed in different ways.There's a wide array of different applications.I'm also really excited to announcethat the D-Lab at UC Berkeley areamong the first institutions that are launching

  • 01:54

    CLAUDIA VON VACANO [continued]: an applied data science for social scientists online coursethrough Sage campus.And so we deploy a wide array of different methods and tools.So first of all, we look at data formatsand we then talk about collection of data.So that includes survey design, crowd sourcing,

  • 02:19

    CLAUDIA VON VACANO [continued]: collecting data from the web.In addition to that, there's a lot of analytic applications,of course.So we look at data visualization.We explore those topics, as well as computational text analysis,machine learning.

  • 02:40

    CLAUDIA VON VACANO [continued]: And all of these different methodsreally enable us to explore new forms of data in different waysat scales that previously were impossible to achieve.I also wanted to add a little bitabout my role in the digital humanities.I encompass both social sciences and the humanities.

  • 03:01

    CLAUDIA VON VACANO [continued]: And the last four years for digital humanities at Berkeleyhave been really productive.We've supported 32 collaborative research projectsspanning from the Louisiana slave conspiracy to Egyptologyand looking at 3D models of coffins.And so a wide array of different projectsthat have evolved over the last four years that are really

  • 03:22

    CLAUDIA VON VACANO [continued]: now coming to fruition.In addition to that, we've supported20 different new courses in digital humanities.And we're extremely excited that we're launching a summeronly certificate program in minorthat's going to be 12 weeks long this summer.So there's a lot of work that reallyspans from the humanities to the social sciences.

  • 03:45

    CLAUDIA VON VACANO [continued]: But really it's these data science methodsthat are bringing together all the different disciplineson campus.And in a way, we're thinking more about problemsthan we are thinking about the disciplines, per se,or the methods in isolation.One of the really exciting thingsthat we're doing at the D-Lab currently

  • 04:05

    CLAUDIA VON VACANO [continued]: is we're exploring the topic of hate speech.So what we've done is the first cyclewe took 80,000 subreddit comments.At the time, Reddit was not monitoring hate speechon their platform.And so that's the reason we selected Reddit as the platformwe wanted to explore.

  • 04:26

    CLAUDIA VON VACANO [continued]: So we took 80,000 comments and westarted to label that data through qualitative methods.And we trained a group of 10 undergraduate students, a verydiverse group of students.We selected them for the purpose of their different viewpointsand the ways that they were goingto interpret the information, bringingtheir own positionality and their own perspectives

  • 04:48

    CLAUDIA VON VACANO [continued]: to the table.And they would look and read through these comments.And many times interesting conversationswould arise where people would have differences of opinion.Hate speech is almost a dialect in itself.There's so much coded language within hate speech.And so this kind of discussion really helped all of usunderstand what we were reading in the content

  • 05:09

    CLAUDIA VON VACANO [continued]: that we were reading.We then took that label data and our machine learning teamwent through a series of different machinelearning algorithms.And ultimately that resulted with a modelthat was 85% accurate in predictingwhether a comment was hate speech or not.And so that was a proof of concept--

  • 05:30

    CLAUDIA VON VACANO [continued]: the first cycle was a proof of concept.We're now at the second cycle of that research project.And what we're currently doing, we'vedeveloped an instrument that askswhat is the target population.In the first cycle, we actually broadly openedthat question of who are the people thatare being spoken about.In the second cycle, we actually have a list of protected groups

  • 05:54

    CLAUDIA VON VACANO [continued]: that are the target groups of the hate speech.And so we ask that the person who is completing this surveyread the comment and then identify what target group isbeing targeted in this comment.And then to see if there is negative sentiment,

  • 06:15

    CLAUDIA VON VACANO [continued]: and the degree to which the negative sentiment is there.And then we're A/B testing a couple of items.One, having to do with whether there's a sense of superiorityor inferiority embedded in the comment.And another one is if there's a sense that the target group is

  • 06:37

    CLAUDIA VON VACANO [continued]: vulnerable to violence or aggressionin some concrete way.So we're going through this process of piloting the surveyinstrument.And we're going to be crowdsourcingin the second cycle.And this time we're not going to have just one data source,but in fact we're going to combinea wide array of different data sourcesthat include Twitter, Facebook, Reddit so

  • 07:01

    CLAUDIA VON VACANO [continued]: that the product of this will be applicable to a widerarray of different platforms.So this is the kind of thing that'shighly interdisciplinary.Our team is composed of people from political science,sociology, biostatistics, and even humanities.So it's really exciting.

  • 07:22

    CLAUDIA VON VACANO [continued]: People get really motivated by that kind of environment.And so it's really a project-based wayof learning data science methods,but it also is a way to contribute.As a public university, we feel that itis our responsibility and our roleto be monitoring issues such as hate speech.And we feel that we could do thisin conjunction and in close partnership

  • 07:43

    CLAUDIA VON VACANO [continued]: with places like Facebook and Reddit.But really that it shouldn't be only Facebookbehind that sort of corporate wallthat are monitoring these issues,but really it should be a transparent processthat is highly accessible to the public.So that's why we're really excited to be

  • 08:03

    CLAUDIA VON VACANO [continued]: partnering with the Anti-DefamationLeague on this work.They're funding this work and supporting us.They have a long, you know, 100 year historyof working on issues such as hate speech.So that's one example of the type of workthat we do at the D-Lab.We're very excited that we're partnering with the Sage campusas one of the first institutions, UC Berkeleyand the D-Lab, one of the first institutions that's actually

  • 08:27

    CLAUDIA VON VACANO [continued]: creating content for this online vehicle to deliver contentand learning.And what we're doing is providinga course titled Introduction to Applied DataScience for Social Scientists.And it's going to be comprised of bite size modules.First, we make available a Python and R mini course.

  • 08:51

    CLAUDIA VON VACANO [continued]: And you can take one or both.The entire course is designed so that you could completethe course with either a little bit of Pythonor a little bit of R under your belt.And we start from scratch.The first thing importantly that we discuss is ethics.We really feel that there's many ethical dimensions that

  • 09:12

    CLAUDIA VON VACANO [continued]: previously were not of concern that de-identificationand re-identification of data makes certain populationsparticularly vulnerable.For example, previously incarcerated people,people who are undocumented, and maybe peoplewho are vulnerable in terms of being targeted in sort of hate

  • 09:36

    CLAUDIA VON VACANO [continued]: crimes, et cetera are at a stage that we reallyneed to be thoughtful about the datathat we're making available because there'sso many different data sources out there.Also, there's a lot of de facto social experimentsthat are happening online.And so we really want to inform social scientistsand the public about these ethical issues

  • 09:57

    CLAUDIA VON VACANO [continued]: and put that in the forefront of this course.Besides that, we talk about sort of the meatand potatoes of data formats, the structures of data.We talk about different data sourcesthat are nontraditional.So scraping data from the web, doing

  • 10:18

    CLAUDIA VON VACANO [continued]: crowd sourcing of information, and thencreating survey instruments online.There's so many different populationsthat could be respondents for studies currently.And so we really optimize those opportunities.In addition to that, we focus on more analytic approaches.

  • 10:40

    CLAUDIA VON VACANO [continued]: So starting with data visualizationso that you can really look at the shape of your dataand see if there's any anomalies in the data,but also to present information in ways that arecompact and highly informative.But besides that, we also talk about computational textanalysis, machine learning, geospatial analysis.

  • 11:03

    CLAUDIA VON VACANO [continued]: And so many different methods thatare exciting to explore and new to moretraditional social scientists.One of the reasons why we're so successful at UC Berkeleyis because scholars feel that we don'thave a bias in favor of data science,in favor of quantitative methods.

  • 11:24

    CLAUDIA VON VACANO [continued]: We are very open methodologically.We believe that if you employ ethnography,if you use qualitative methods, mixedmethods, quantitative methods, it reallyis about the research question and the research agenda.We happen to focus and make available censusdata, federal data, all kinds of data sources.

  • 11:46

    CLAUDIA VON VACANO [continued]: And we're experts in that in secure data issues.And so that is another resource that wemake available to our community and within Berkeley and beyond.And so we're happy to inform peopleabout very conscious and wise ways of using data.But we would not say that anyone should actually

  • 12:08

    CLAUDIA VON VACANO [continued]: use one method or another.And I think that really lowers the barrier in termsof making it highly accessible.At the D-Lab we really believe in havinga zero barrier for entry.And it's a really different environmentthan young scholars would have with their advisorsor their dissertation committee.They can come to the D-Lab and ask any question.

  • 12:29

    CLAUDIA VON VACANO [continued]: It's OK not to know is our motto.And so there's a zero barrier for entry.And really that's part of the cultureis not having a disciplinary bias,not having a bias towards one methodology or another,and to really make it highly accessibleand a welcoming environment.Another thing I would add is that we reallybelieve in optimizing the knowledgeand expertise that everybody is bringing to the table.

  • 12:52

    CLAUDIA VON VACANO [continued]: And so we have a learning community modelwhere you can first be sort of participatingon a peripheral kind of basis, but little by littlethe more you get involved in consultation, and teaching,and our working groups, than the more immersed you areand the more central you become to the community.

  • 13:12

    CLAUDIA VON VACANO [continued]: So it's really a very soft and gentle entry into data science.And there's other places that aredoing fantastic work on campus as well that are solelyfocused on data science.

Abstract

Claudia von Vacano, PhD, Executive Director of the Social Sciences D-Lab and Digital Humanities at UC Berkeley, discusses some of the current research at D-Lab, including methodology and the use of machine learning to study hate speech, and the course, Introduction to Applied Data Science for Social Scientists, offered in conjunction with SAGE Campus.

Looks like you do not have access to this content.

Claudia von Vacano Discusses Social Data Science & Digital Humanities

Claudia von Vacano, PhD, Executive Director of the Social Sciences D-Lab and Digital Humanities at UC Berkeley, discusses some of the current research at D-Lab, including methodology and the use of machine learning to study hate speech, and the course, Introduction to Applied Data Science for Social Scientists, offered in conjunction with SAGE Campus.

Copy and paste the following HTML into your website