Skip to main content
Search form
  • 00:00


  • 00:09

    JUSTIN ESAREY: My name is Justin Esarey.I'm an Associate Professor of Politics and InternationalAffairs at Wake Forest University.There are a lot of new techniques beingdeveloped in data science.Data science is a very hot career right.Now a lot of students are interested in it.Eventually, you're going to go on

  • 00:30

    JUSTIN ESAREY [continued]: to learn about things like neural networks,and random forests, and CART models, and all kinds of stuff.But the very first thing, the foundationof any statistical and social scientific education,is linear regression.It's usually one of the first things you learn.It's a very versatile method.It can be applied in a lot of situations.And when a fancy method tells me something

  • 00:50

    JUSTIN ESAREY [continued]: that regression doesn't, that raisesa question of what-- that doesn't necessarilymean the fancy method is wrong, but it doesmean that I want to know why.Like what is different about this other method allowsme to come to that conclusion?So it's a good baseline.It's something you always--I would say most social scientists,including myself, start with regression.

  • 01:16

    JUSTIN ESAREY [continued]: Well, no student of social science--it doesn't matter whether you're interested in linear regressionor not, you're going to learn it because it'sso at the core of everything we do.I mean, it's a simple, relatively simple idea, right?If you imagine a scatterplot of data in your head with xand y-axes and scatter points, essentially just asking what

  • 01:37

    JUSTIN ESAREY [continued]: line fits that data the best--of course with a very specific meaning of best fit.And it's a relatively simple idea,but most of what we do in social science follows from that idea.It's just, you know, maybe not using a line,but using a more complicated function,or sorting the data in some different way,

  • 01:58

    JUSTIN ESAREY [continued]: or analyzing the uncertainty and that estimate in some way,in a different way.But if you can really understand how to essentiallydescribe or summarize data with lines and linear functions,you can go a long way.And, in fact, I would say a lot of the most cutting edgemethods in social science things,

  • 02:19

    JUSTIN ESAREY [continued]: like a lot of the causal inference methods,you know, sometimes they use matchingor some other procedure, but a lot of themare still fundamentally rooted in interpretationsof linear regression in well-designed studies.And so your entire career may well,depending on exactly what you do and how you go,might be rooted in linear regression, which

  • 02:40

    JUSTIN ESAREY [continued]: is a simple method, but an incredibly powerful one.So what I would say is, no matterwhat project you're going to be working on,there's a good chance, an excellent chance,that linear regression is going to beable to help you understand what's going on in your data.

  • 03:01

    JUSTIN ESAREY [continued]: So sometimes, it's not--for example, if you have a continuous dependent variableand a continuous independent variable,that's the sort of classic case of when youwould use linear regression.But linear regression can also be surprising and helpfulwhen your dependent variable is binary.So for example, if you're studying vote choice, right?You don't cast 8% of a vote.

  • 03:22

    JUSTIN ESAREY [continued]: You cast a vote or not.You know, you cast a vote for candidate A or candidateB in a two party election, and so that'sa binary dependent variable.But, even then, regression can be a great wayto summarize your data to figure out why peopleare choosing that candidate.And even if you go on to do something else,for example, you might go on in vote choice model

  • 03:44

    JUSTIN ESAREY [continued]: to use some sort of logistic regression,maybe use some sort of CART modelto classify people in a more complicated way,if regression didn't tell me the same thingthat those models did, I might believe the more complicatedmodels, but the first thing I'd ask is why, right?What is going on in these more complicated models that'snot being pulled out by a simple linear relationship?

  • 04:05

    JUSTIN ESAREY [continued]: I mean, I liken it to something even more basic,which is a scatterplot, right?Not exactly a high tech method.Any method you can do with a paper and pencilis it's not that high tech, but very powerful and important.You never want to be caught in a situation where you'remaking all these claims, and thensomeone shows you a scatterplot of your data,

  • 04:26

    JUSTIN ESAREY [continued]: and it doesn't look good, right?You start from a basis.And so if you can understand data visualizationthrough scatter plots and other means,and regression very well, then youcan go on to do other things and use them appropriatelyand to their best effect.

  • 04:50

    JUSTIN ESAREY [continued]: There's no project you do where regression can't be used,and I'll give you a sample, somethingI did very, very recently.So there was recently a paper publishedby Michelle Dion, and Sara Mitchell,and Jane Lawrence Sumner that wasabout citation of women versus men in social science journals.And they were concerned that women were

  • 05:11

    JUSTIN ESAREY [continued]: being cited less frequently.And so what we did--me and my co-author Kristin Bryant--we used linear regression, we collected a ton of informationabout a bunch of articles and how often theywere cited by other scholars, and then wejust used linear regression to estimate the proportion--

  • 05:32

    JUSTIN ESAREY [continued]: I'm sorry, not the portion--the number of citations collectedby women authored articles versus the number of citationscollected by male authored articles.And what we found-- and that allows us to basically ask,are those numbers that number of citations the same, right?Or is it substantially different?And not exactly.

  • 05:53

    JUSTIN ESAREY [continued]: It's a count dependent variable, right?You can't have 2/3 of a citation,you have to have one, two, or three, or four.You also can't have negative 5 citations.No matter how bad an article is, it cannot have negativecitations.So not exactly 100% the classic case, but good enough.And what we found using linear regressionwas that once you control for the journal

  • 06:15

    JUSTIN ESAREY [continued]: the article was published in and also the number of authorsof the article, a male and female authored articleare cited exactly pretty much exactly the same.There's a very slight difference,but it's not statistically meaningful.Now, that doesn't mean that if you justput all the articles in a pile over hereof all the male authored articles and allthe women authored articles over here

  • 06:35

    JUSTIN ESAREY [continued]: there wouldn't be a difference.There actually is a difference.In fact, that if you just look at all the articleson their own, female authored articlesare cited less, which is something to note.That's interesting.What regression allows you to do is compare like casesto like cases.So let's compare a female authored article in APSRto a female authored article in APSR.

  • 06:57

    JUSTIN ESAREY [continued]: Or I'm sorry, male author article in APSR,and that's what we really want to know.We want to know if all else equal,are these two articles, male and female authored,going to be cited the same?And the answer is yeah.So the sort of moral of the story we pulled outis it doesn't appear that people citing articlesare actually treating the articles differently

  • 07:18

    JUSTIN ESAREY [continued]: because, again, controlling for the publicationjournal and the number of articles, number of authors,it's they are the same.But that doesn't mean there's not something going on, right?Maybe it has to do with who publishes in what journal.Maybe it has to do with why peoplesubmit to certain journals.Maybe women submit to certain journals less.Maybe it has to do with the fields women work in, right?

  • 07:38

    JUSTIN ESAREY [continued]: So it was not the final word on that question,but it did tell us something very important,in my opinion, which is actually,you know, it's not that the people citing the articlesare treating them differently.That's not true, which is a good thing to know.

  • 07:60

    JUSTIN ESAREY [continued]: So we have students that enter PhD programs every daywith no more preparation than college algebra or high schoolalgebra and perhaps a semester of calculus,and you can make it on that.People do make it all the time.I would recommend, though, if youdecide that you would like to be a quantitative social

  • 08:21

    JUSTIN ESAREY [continued]: scientist, I would say do yourself a favorand take linear algebra in college, or maybein the summer between college and grad school, and maybean additional semester or two of calculus.It'd be nice to have two, maybe even three semestersof calculus under your belt. And you'll thank me laterbecause, at the time, it will seem hard-- and it is--

  • 08:43

    JUSTIN ESAREY [continued]: but it's really good to get that stuff out of the waybefore you get to graduate school.So then you can really focus on the stuffyou're supposed to be studying grad school--substantive politics and the statistical methodologies.So I really recommend students have, you know,just a few classes.And, of course, if you're a math major, that's fantastic,but you don't need that.

  • 09:04

    JUSTIN ESAREY [continued]: It would also be great if you had taken, at least, some sortof background in econometrics, or political methodology,or statistics, something in undergradthat had introduced you to the basic concepts of covariance,and variance, and median, mean, sort of basic stuffthat you want to have in your toolkitwhen you get to graduate school.

  • 09:25

    JUSTIN ESAREY [continued]: If you can come to graduate school, or master's program,or something with those tools already out of the way,you're going to be in a great positionto learn what you need to learn in the program.I think a lot of people have the impressionthat you have to be some sort of nerd or math geek

  • 09:48

    JUSTIN ESAREY [continued]: in order to be good at data science, and that is not true.Social science is a very interesting thing in the sensethat it's an intersection of many disciplines, right?The best social scientists have a good humanistic understandingof interpretive data and ethnography,but they also have a strong quantitative senseof how to analyze data.They have a sense of theory.

  • 10:09

    JUSTIN ESAREY [continued]: They have a sense of how causality works.It's really interesting because unlike you know in a mathdepartment, you pretty much just haveto be good at math, which kind of makes sense.In a political science department or sociologydepartment, the best people are good at a lot of things.There's actually even a famous book called Expert PoliticalJudgment by Phil Tetlock, where he shows that the people who

  • 10:29

    JUSTIN ESAREY [continued]: are best at forecasting, super forecasters, they'renot experts in one small thing.We are answering some of the same questionsthat philosophers and humanists are asking,and what we're doing is applying the methods of scienceto those questions to see if we canmake additional progress, which does not mean that we ignore

  • 10:52

    JUSTIN ESAREY [continued]: all the other things that have been doneand all the humanistic modes of inquiry.And, in fact, those are often the starting pointsfor what we do.And that's, again, the being a bitof a jack of all trades in social sciencesis a good thing.You should not spend all your time--I mean, I'm not great.I did spend all my time in the computer lab.That's why I am who I am, but, you know,

  • 11:13

    JUSTIN ESAREY [continued]: it's best to not spend all your time in the computer lab.It's best to spend some time in the fieldor spend some time interviewing.Heck, work for a congressman, that's fine.PhDs in political science are not generallylooking to go into politics, but interacting with politiciansis not a bad thing.And that kind of qualitative understandingcan really help you when you're doing quantitative science.

  • 11:37


Video Info

Publisher: SAGE Publications Ltd

Publication Year: 2019

Video Type:Tutorial

Methods: Linear regression, Computational modelling

Keywords: calculus; citation analysis; computer science; data visualisation; econometrics; linear models; multidisciplinary training; regression; regression analysis; Scatterplot; Social science research; Statistics and research methods ... Show More

Segment Info

Segment Num.: 1

Persons Discussed:

Events Discussed:



Justin Esarey, PhD, Associate Professor of Politics and International Affairs at Wake Forest University, discusses the use of linear regression models in computational social science, including why linear regression is an important concept, the kinds of research questions these models can answer, the kinds of data that can be analyzed, examples of research using these models, advice to someone new to computational social science, and why computational social science is important.

Looks like you do not have access to this content.

An Introduction to using Linear Regression Models

Justin Esarey, PhD, Associate Professor of Politics and International Affairs at Wake Forest University, discusses the use of linear regression models in computational social science, including why linear regression is an important concept, the kinds of research questions these models can answer, the kinds of data that can be analyzed, examples of research using these models, advice to someone new to computational social science, and why computational social science is important.

Copy and paste the following HTML into your website