Skip to main content
Search form
  • 00:10

    My name is Kokil Jaidka.I'm a postdoctoral researcher at the University of Pennsylvania.And over there, I'm using social media and language modelingto understand how people disclose informationabout themselves, about how social media signals canbe used to understand mental health and well-being

  • 00:32

    as well as intentional behaviors,such as political participation.My PhD is from Nanyang Technological Universityin Singapore, and it's in information studies.

  • 00:52

    I was looking at academic texts for my PhD.As I was winding up my project, something bighappened in India.There was a protest on Twitter, whichwas protesting against government inactionafter the rape of a young girl in New Delhi, India.

  • 01:15

    It fermented into an actual offline movement.I found that very interesting.A mixture of outrage and interestin this new social movement that was building in Indiagot me interested in looking at this data,collecting this for my own, and thisbecame my first and largest project

  • 01:37

    in understanding social movementson Twitter and in India, in the Indian context.That's how I got involved in working with social media.Since my original project about social movementsand political participation, I've

  • 01:59

    started to look at other applications of usingsocial media language to profile people,to understand them better, and even possibly to intervene,to help them manage their own well-being.Actually contributing to someone's well-being

  • 02:21

    is the overarching goal of the world well-beingproject at the University of Pennsylvania,where I'm working.Well-being is an overarching termwhich is used to measure both a person's self-evaluation of how

  • 02:43

    well they're doing as well as their emotional experiences.It's used more and more, even by economists,to understand the real state of a country.It's considered to be closely associatedwith the economic health of a nation, for example.

  • 03:06

    It's considered to be closely associated with healthand also with mental health.So other synonyms of well-being are life satisfaction.Increasingly, there are more and more nations, governments,corporates who are spending time and money into serving

  • 03:27

    populations to measure well-being, and usingthat to dictate policies and to plan corporate strategies.Right now, I am trying to understandbetter about how social media platforms, their affordances,

  • 03:52

    can constrain or enable people to sharemore or less about themselves.For example, what I'm talking about at ICWSMis a comparison of the affordancesof Facebook vs. Twitter, how either platform

  • 04:12

    enables people to present themselves differentlyand allows themselves to disclose more or lessabout their personal traits, such as how much stressthey're feeling and how much empathy theyfeel towards others and more basic demographic traits,such as their age and gender.

  • 04:41

    The great thing about working on an interdisciplinary researchproject is that we can use methodsboth from social science and from computer scienceto get a really good understanding of whatpeople are like.Or in other words, to profile them.So we are using a survey-based methodto recruit people via Qualtrics, and we also

  • 05:05

    obtained their informed consent to collect their Facebookand Twitter posts.So this study is approved by the Institutional ReviewBoard of the University of Pennsylvania.Any participant who is recruited through Qualtrics

  • 05:26

    will answer a survey questionnairevery similar to a typical social sciencestudy, which will have items from different psychologicalscales.But at the same time, they will be also askedto share their informed consent for usto collect their Facebook and social media posts.

  • 05:47

    And it's only the people who consentthat we study in the end, and it's alsoonly the people who have enough language on Facebookand Twitter who were a part of the study.Social media has a lot of signals

  • 06:07

    that can be used to understand what people are likeand to profile them.My work focuses more on their language.That is, the language or the words that they post.In a sense, these are unstructured.They are strings of words, but we wantto convert them into numbers.So we use frequency-based methods

  • 06:31

    to convert the entire vocabulary into the relative frequencyof usage.That becomes the way we represent each user,as a list of words and relative frequencies.That gives us a way to understand.When we compare a person's vocabulary

  • 06:55

    or their relative frequencies of usagewith other traits that they report on,such as their age and gender or their stress,we can identify patterns.We can identify strong signals of different traitsfrom their language.

  • 07:22

    My expertise is in using language modeling to understandpeople's behavior.The computational methods that we usecan be based on dictionaries, whichare theory-based lexica that describecertain categories of concepts, such as positive emotion

  • 07:44

    and negative emotion.But if we have enough data and we have enough language,then we can be a little more adventurousand we can use an open vocabulary approach.That is, instead of using a small number of concepts,we use all the vocabulary of an individual in what is known

  • 08:07

    as an open vocabulary approach.So we're using all the vocabularyof these social media users.We can identify new topics that are emerging in social mediadiscussions, which maybe theory-based approaches cannot

  • 08:28

    capture.For example, one concept which is closely relatedto higher well-being is a higher mention of sports.People who like to discuss sportsor who are active, who spend a lot of timeoutdoors hiking and camping are usuallymore likely to have higher well-being, higher life

  • 08:50

    satisfaction.While this is an insight that we gotas a result of our use of data-driven approaches,it's not necessarily that theory-based lexiconwould have revealed this to us, because theory-based lexiconmay not have considered that a mention of sports specificallyor a mention of football could yield higher well-being.

  • 09:23

    A limitation of using social science methodsalone is that they're expensive to run.Surveys take up people's time.They cost money.Every individual typically needs to be paid.On the other hand, an advantage of computational methodsis that they can be automatically applied

  • 09:44

    to large data sets such as the ones thatare available on social media, where individuals are sharingtheir opinion and their thoughts in an unsolicited manner.This is why these two approaches work so well together.While social science approaches canoffer the theoretical grounding to understand

  • 10:08

    people's psychological traits better,computational methods can bring in what they actually do.So that's an observational vs. psychological cycle.So that's an observation vs. a psychological insightinto the same person.They work together to inform each other

  • 10:30

    and to inform inferences about the person.One thing to keep in mind when we are doing these studieswith social media data is that social media vocabulary changesa lot over time in the sense that our results from 2013

  • 10:54

    may not be valid anymore in 2018.But in the results from the world well-being projectin 2013, we found that higher mentions of anime, computergames, television shows in generalwere indicative of introversion in a sample of 70,000 Facebook

  • 11:19

    users who answered a survey and shared their Facebook data.However, a caveat is that this data was collected in 2013.The same may not be valid anymore.These are things that we need to keep in mind as weuse social media data and we use it to infer information

  • 11:42

    about people.My hope is that I'll be able to takewhat I've done so far in understanding peopleand use it to help them to improve their well-being.My next project is going to be at Nanyang Technological

  • 12:03

    University Singapore, where I'll be joiningas a presidential postdoc.I'm going to have my own grant and I'mgoing to be hopefully using that grantto improve the being in teens and adolescentsthrough their use of social media.

  • 12:26

    A lot of our research has built upon work done by others.That's one thing about the research community.We do need tools to enable our own workand to help each other out, whichis why the initial work done by peoplein the computational linguistics communityhas been so important to build upon

  • 12:46

    and to create this whole new areaof computational social science.As time has gone on, our needs have only increased.There is an increasing need to createmethods that are specific to social media language.There is an increasing need to understand what images meanand what emojis mean, what slang means

  • 13:06

    and how language is evolving over time.I think the main challenges associatedwith using social media data are to always stayon top of the trends in usage, whichcan be affected by as little as something as the way a buttonis colored.So UX designers and companies have a big role to play,

  • 13:31

    whether intentional or accidental,in enabling people's self-disclosure.And these small changes might affect a lot of studiesto go on it.It's a domino effect, which needs to be kept in mind.So what I'm trying to say is that thereare platform effects associated with user data that

  • 13:53

    need to be kept in mind.There are platform biases.Any study of users should be consideredin the context in which it was collected,which is a big challenge.We cannot extrapolate results from the United States to othercountries in the world.We cannot extrapolate results from one community

  • 14:17

    to understand other communities.That's a challenge which is gaining more and more attentionthese days.It still needs a suite of methods, a proper researchframework to address it.But I think in starting to talk about it, we're getting there.

  • 14:42

    I think one of the key things to keep in mindfor young scholars who want to do computational social scienceor do computer science for social media data isto be aware of the other research out thereand to inform themselves during their research

  • 15:02

    design about the limitations and the affordances of the platformthey are considering.I think one of the great things about the interdisciplinaryresearch community of computational social scienceis that there's a lot to learn from the social science sideas well, especially in how carefulthey are in designing different research conditions.

  • 15:23

    It's definitely important for a young scholar coming into be very familiar with the pros and cons and challengesof designing these research studiesand in knowing what questions can actuallybe answered from this data.


Kokil Jaidka, PhD, postdoctoral researcher at the University of Pennsylvania, discusses her research on natural language processing and social media analytics to study emotional well-being, including the questions the research was designed to answer, data collection, recruitment process, computational methods used, the open-vocabulary approach, interesting results, the future of this research, and advice for someone new to social media analytics.

Segment Info


Segment Num: 1


Segment Start Time:

Segment End Time:


Things Discussed

Organizations Discussed:

Events Discussed:

Places Discussed:

Persons Discussed:

Methods Map

Natural language processing

Natural language processing is a sub-category of artificial intelligence in which algorithms facilitate processing of human language data, such as text and speech, so it can be understood computationally.
Natural language processing
Studying Emotional Well-being Using Natural Language Processing & Social Media Analysis

Kokil Jaidka, PhD, postdoctoral researcher at the University of Pennsylvania, discusses her research on natural language processing and social media analytics to study emotional well-being, including the questions the research was designed to answer, data collection, recruitment process, computational methods used, the open-vocabulary approach, interesting results, the future of this research, and advice for someone new to social media analytics.

Copy and paste the following HTML into your website