    [MUSIC PLAYING]Gary King Discusses Replication in the Social Sciences

  • 00:11

    GARY KING, PHD: Hi.I'm Gary King.I'm the Albert J. Weatherhead III University Professorat Harvard University.I'm also Director of the Institutefor Quantitative Social Science, also at Harvard.What does it mean to be scientificand what is replication?What all this is about is what science is about.

  • 00:32

    GARY KING, PHD [continued]: So what is science about?Science is about inference.What is inference about?Inference is about using facts you have to learnabout facts you don't have.That's pretty much all science is about.So how do you use facts you have to learnabout facts you don't have?Turns out there's lots of ways of being scientific.And if you follow those scientific rules,you might get closer.

  • 00:53

    GARY KING, PHD [continued]: But that's not where the progress came from.Pretty much all the progress that humanityhas made in the last 400 years comes from science.But science is not about individual scientistsacting scientific.Science is about the community of scholarsworking in cooperation and competitionin pursuit of the same goals.

  • 01:14

    GARY KING, PHD [continued]: When I do a study, it's difficult for meto check my own work because it'sreally easy for any one of us to fool ourselves into thinkingthat we're doing a great job.It's very difficult, or it's more difficult,for any one of us to fool somebody else into thinkingthat we do a great job.So that's the essential insight-- the brilliancethat is science.

  • 01:34

    GARY KING, PHD [continued]: That's the way that human beings have learned to learn.It's the way that we've figured out how to make progress.It's the main driver of progress in all of societyin the last 400 years.So anything we can do to empower science--that is, the community, the community of scholarspushing things forward and checking on each other's work--

  • 01:55

    GARY KING, PHD [continued]: the more progress we're going to make.If the essence of all of science is community,then how do you build the community?Well, scientists have to know each other.They have to talk to each other.They have to have to be able to read each other's work.Several hundred years ago, if youwrote an article or a manuscript of some sort,someone else wouldn't even be able to see it unless yougave them permission.

  • 02:16

    GARY KING, PHD [continued]: And you wouldn't necessarily give them permissionunless you had some kind of assurancethat they wouldn't criticize you.Nowadays, if somebody writes something,they don't even get credit for itunless it is available publicly.So that's gone.What isn't completely gone, or what has taken a whileand is taking a while to completely vanquish,

  • 02:37

    GARY KING, PHD [continued]: is the idea that you can read my article,but you can't read the data and the codeand the analysis that was used to produce the article.It turns out that articles and books are justadvertisements for the research rather thanthe research itself.They're quick summaries of the research.The amount of work that goes into any one of these articles

  • 02:58

    GARY KING, PHD [continued]: is enormous.You couldn't put a transcript of all that informationinto the written article.No one would want to read that.But all that other information is incredibly important.So if you want to follow up on what someone else has done,if you want to check on what they've done,if you want to build on what they've done,you have to do more than just read the article.You have to have access to the data and the code and all

  • 03:19

    GARY KING, PHD [continued]: the procedures that they followed.So replication means several different things.At a simple level, the way I've written about it the mostand the way it's most useful in observational workis when you start with someone else's dataand you just reproduce the tables and figures.So you go through all the proceduresthat they describe in the article.

  • 03:41

    GARY KING, PHD [continued]: They got the data from some particular source.So you have the data.They follow a whole series of statistical proceduresand recodes and analyses and graphical proceduresand statistical innovations until theyproduce their tables and figures and numerical conclusions.It seems completely obvious that it

  • 04:02

    GARY KING, PHD [continued]: ought to be the case that when someone claimsto have done that set of things, that if someone else getsthe data, they ought to be able to do those things, too.It turns out it's very difficult to do that if you onlyread the article.So you often need to do more.So replication at this first levelinvolves getting the data that the author got or the authorclaimed that the author got, following all the procedures

  • 04:24

    GARY KING, PHD [continued]: that the author followed, and getting exactlythe same tables and figures and numerical results.That's what replication is at the simple level.At a more complicated level, in some fields,it's possible to reproduce the analysis.So if you're doing an experiment and you'redrawing in a sample of students from

  • 04:45

    GARY KING, PHD [continued]: an undergraduate population, you can do that again.So we don't only need to get the data from the original article.We could just get the procedures and apply the procedures againand draw the data again.We'd much rather use this second version of replicationthan the first.But if you're doing an analysis of the Cuban MissileCrisis or the French Revolution, it's a little hard

  • 05:08

    GARY KING, PHD [continued]: to run those a second time.So it's absolutely essential.I don't think anybody really can disagree with the ideathat at some point, the data and code that'sused in support of a published articlemust be available to the public and to other scientiststo be able to validate what they're claiming.

  • 05:28

    GARY KING, PHD [continued]: What are the barriers to replicationand how should we encourage scientists to share their data?So the barriers to replication are getting accessto the information and the data and the codethat the original author in the original article had.And sometimes, it's too difficult to get that.For almost 30 years, I've assigned to my students

  • 05:48

    GARY KING, PHD [continued]: the task of replicating articles.So their task is to go out, find an articlethat meets certain criteria.They get the article.They try to then go get the data that the author usedin the article, ideally without talking to the author,at least at first, and then follow all the procedures

  • 06:09

    GARY KING, PHD [continued]: and the analyses and basically everythingthat the author said that the author did,and to reproduce the tables and figures.The barriers to replication are getting the raw materials.That's the first and most important issue.Once you have it, you're over the biggest hurdle.But you're not there yet because evenafter you have the original the data

  • 06:31

    GARY KING, PHD [continued]: and you have some code from the authoror some description from the author,you don't necessarily have an easy pathto be able to produce the same results as the author.It's often the case, very often the case,that even after my students are able to get the original dataand they have the article and theyhave some description of the data, often from the author,

  • 06:52

    GARY KING, PHD [continued]: they're not able to produce the same results as the author.And so from that, we can learn something.Sometimes, we just correct something the author did--sometimes in spectacular ways.Sometimes, we just learn something new.We learn a new way of analyzing the data.So those are the kinds of barriers.What are the reasons for these barriers?

  • 07:12

    GARY KING, PHD [continued]: Well, sometimes authors think, I put so much effortinto collecting these data, I wantto keep mining the data for a long timeand publishing articles from it.I don't want to be scooped.Well, the real problem in academia is not being scooped.The real problem is being ignored.And if you make your data available,

  • 07:32

    GARY KING, PHD [continued]: you're much more likely to be citedand to be paid attention to.So it's much more in your interestto make your data available than itis to keep your data proprietary and hide it from everyone else.Moreover, it doesn't really make sensethat you could publish an article,you could use hours and hours of timeon the part of your readers to read the article,

  • 07:54

    GARY KING, PHD [continued]: and not give them enough informationto be able to verify that what you say is actually true.It just doesn't make any sense.So there's plenty of barriers.We're all human beings, fortunately and unfortunately.And so therefore, it's difficult sometimesto convince people who may have crawled

  • 08:14

    GARY KING, PHD [continued]: through bug-infested fields for five yearscollecting data or the equivalentto just make it available to somebody elseto write an article by themselves.As it happens, it's still in their interest to do so.Even if you collected all the data, you paid all the money,you put all the time in, you got your one article out of it,

  • 08:35

    GARY KING, PHD [continued]: it is still in your interest for someoneelse to write another article from your data that doesn'thave you as a co-author.Why would that be?Well, because they're going to follow up your work.It's terrific to write an articleand have italics on your CV so that it's actuallya journal article or a book or something like that.That's a great thing.That's one level.

  • 08:56

    GARY KING, PHD [continued]: A level above that which is really importantis writing something that someone else reads.It's possible-- in fact, even sort of likely--that if you write an article and itis so good that is accepted in a serious journal,that basically nobody ever reads it or it's read by a few people

  • 09:16

    GARY KING, PHD [continued]: and it's never cited and basically forgotten.You don't want that to happen.That could happen to other people.Why should happen to you?For you, you should write somethingthat people are going to follow up on.That's not only important for youbecause all the perks and academia sort of increasemonotonically with roughly number of citations

  • 09:36

    GARY KING, PHD [continued]: and influence and things like that, but it's not only that.It's like your contribution to the scientific communityis the point of what it is you're doing.So if you're making a contribution thatis being ignored, that's not really a contribution.Maybe someday, they'll pay attention,you'll be discovered.But you might as well write it in a waythat people can check on, can build on,

  • 09:58

    GARY KING, PHD [continued]: can write new articles from, they'll cite you.You'll have plenty of credit, and you'llmake a difference with what you're doing in that way.Why might a study not replicate?So some of the reasons a study might not replicateis because the author thought the author did something,

  • 10:20

    GARY KING, PHD [continued]: but the author didn't do that thing.Another reason is the author is trying to pull one over on us.That's very rare.There are some examples of this.There's some in the media that you all probably know about.There's some my students have uncovered.Every few years, there's a pretty spectacular discoverylike that where there was basically fraud.

  • 10:41

    GARY KING, PHD [continued]: In every group of people, it doesn't matter how impressivethese people are.You put them together, and one out of I'mnot sure what the number is, is goingto do things like commit fraud.It's silly.It's unfortunate.It happens.And it's great that we are able to discover those things.But that's a trivially small part of replication.

  • 11:02

    GARY KING, PHD [continued]: That's not what replication's about.And the reason why is because it's really easyto surface those things.We find those things very fast.Since we can find them so fast, theyhappen much less frequently.So it really isn't an issue.So why can't you replicate?That's the obvious one, but it's a real small percentage.

  • 11:22

    GARY KING, PHD [continued]: Much more frequently, the author in the original articleand perhaps in the replication materialsjust wasn't sufficiently clear about what the author actuallydid.And that clarity is incredibly importantbecause sometimes, the answers dependupon small changes in assumptionsand small changes in statistical procedures

  • 11:43

    GARY KING, PHD [continued]: and coding procedures and data sources.And that can produce different substantive results.If your results are robust to all these different decisions,that's terrific.It's a better result. But we need to know whether it isor whether it isn't.So sometimes, these small changesproduce very big results.Anybody who's tried to replicate somebody else's

  • 12:03

    GARY KING, PHD [continued]: results without professional replication datasets, which we could all talk about-- which we should alsotalk about-- often has a situationin which they run the same regression,and there's a different number of observations.That's sort of a clue that something's going on.But there's plenty of other things like that along the way.

  • 12:24

    GARY KING, PHD [continued]: So those are the kinds of reasons why studiesdon't actually replicate.What sort of data set should you buildto make replication easier?So what should you do?What you should do when you writean article is while you're writing the article,the data's yours.You should keep the data.If someone asks you for it, it'd be very nice if you provide it.

  • 12:47

    GARY KING, PHD [continued]: But you're under an obligation, I think,when you have your article accepted.When you have your article accepted, go to Dataverse.Create a data set, a replication data set for what you did.Your replication data should include your original data,all the code necessary to take the original dataand produce your tables and figures,and the set of instructions about how

  • 13:08

    GARY KING, PHD [continued]: to use the code-- maybe even what operating system you used,what software you used, et cetera.But it has to be enough so that if someone else istrying to reproduce your results,they don't have to talk to you.They don't have to find you because as importantas you are, you're not relevant at this point.What's relevant is what you put into the public community, what

  • 13:29

    GARY KING, PHD [continued]: you put on the public record.You, the fact of you, is not relevant.It's your contribution that's relevant.One way to say this is no one cares what you think.We only care what it is you can demonstrate.And so it's the procedures that you makepublic that's really important.Best practices-- you create a data set, a replication dataset, in Dataverse.

  • 13:52

    GARY KING, PHD [continued]: We want to get you academic credit and web visibility.The way they do that is to put it,a replication data set, in Dataverse.You go to the Dataverse website.You can start at may be a Dataverse at your university,or perhaps on your website.You can put it on your website.You should put it on your website.

  • 14:13

    GARY KING, PHD [continued]: So on my website, you can go to it.You'll see my website.You'll see a list of articles and you'llsee the classes I teach and all the things that are usuallyon faculty websites.But there's an additional tab on my website,and it's Data or Dataverse.And you click on that, and the URL

  • 14:35

    GARY KING, PHD [continued]: And it's branded as my website.It has the colors and the look and feel of my website.It's not actually my website, however.It's actually served up by the Dataverse network,by the whole Dataverse project.And so that preserves the archiving.But it gets me the web visibility.You should have the web visibility.It's just as easy for you to do this.

  • 14:56

    GARY KING, PHD [continued]: In addition, if you want to get my data,you come to my website, you clickon the data set you want because there'llbe a list of data sets, you can download the whole dataset or a subset of the data.You can download it in whatever format you want.But in addition, you will get an academic citation.That academic citation will be a formal data citation--

  • 15:16

    GARY KING, PHD [continued]: that is a citation to the data-- thathas special features for data.So it'll have your name and it'llhave the title of the data set and the year, justlike a journal article.It'll have a permanent URL that will connect the data setto this string of letters and numberspermanently-- like, forever, which is a long time.

  • 15:40

    GARY KING, PHD [continued]: And in addition, there'll be another number thereif you upload the data the right way.There'll be a Universal Numeric Fingerprint, a UNF.So what's a UNF?A UNF is an ultimate summary statistic of your data.So you take all the data, you run itthrough this special algorithm, and itproduces a string of letters and numbersthat becomes part of the citation as your UNF.

  • 16:01

    GARY KING, PHD [continued]: So what is that thing?That thing is an absolute verification and validationthat the data is the data that you uploaded thereor I uploaded or whoever uploaded it there,and it's the same data.So what you do when you download itis you run the same algorithm because it'spublicly available.And that algorithm will produce that string

  • 16:21

    GARY KING, PHD [continued]: of letters or numbers-- exactly that same stringif the data are the same.The format of the data may have changed over time.It may have been in SaaS formant and it migrated to Stata formatand it migrated to R format.But that string of letters and numberswill remain exactly the same.So it's the absolute validation that 20 years ago whensomeone wrote an article, it is exactly

  • 16:46

    GARY KING, PHD [continued]: the same data that was used in the original article.So best practice for you is to createa data set like that with a permanent identifier and a UNF.You get a citation so that people will cite youwhen they use your data.They don't just thank you.They give you something that's valuable to you,which is a citation.

  • 17:07

    GARY KING, PHD [continued]: What is the crisis of replication?So there's a few crises of replication in lots of fields.So one crisis of replication alwayscomes up when we find somebody that basically committed fraud.And that happens.It happens.It happened in psychology.It happened in political science.

  • 17:27

    GARY KING, PHD [continued]: You can find it in basically every field wherethere are actually humans who arescientists because some humans always do this thing.And so that produces something that we call a crisis.There's another version of a crisis in which scholarsare not fraudulent, but they're pushingthe edge of the envelope a little too far.They're finding the one analysis that they're

  • 17:52

    GARY KING, PHD [continued]: able to tweak until they get a result that they like.So they've basically demonstratedthat it is possible to find resultsconsistent with their hypothesis rather than findingresults that actually confirm their hypothesis.So in either case, if the data's made available,it's very easy to find whether this is the case or not.

  • 18:13

    GARY KING, PHD [continued]: We would like to know that a large fractionof our scholarship and the knowledge that is producedby the scientific community is actually knowledge and notjust made-up stuff.And so the best way to ensure that is to replicate it.The article that I published with threeof my colleagues, Dan Gilbert and Stephen Pettigrew

  • 18:34

    GARY KING, PHD [continued]: and Tim Wilson-- what we did is wereplicated an article that replicated100 original articles.So the original article replicated100 original articles.We replicated their article.What the original article did wasthey were evaluating the replicability

  • 18:55

    GARY KING, PHD [continued]: of psychological science.That is how replicable the entire field of psychology,of the work in psychology, was.And they replicated 100 articles.It was an extraordinary effort with 270 authors on this paper.They were very transparent about howthey replicated the articles.They talked to some of the original authors.

  • 19:16

    GARY KING, PHD [continued]: They produced all the code so it was possible to replicatetheir results.They then drew some conclusions, or at least others drewconclusions from their article.And the conclusions that they drewwas that there was a big crisis in the following sense, whichis a very small fraction of articleswere actually replicable.

  • 19:37

    GARY KING, PHD [continued]: So that sounded like a big problem.So we reexamined the article.We reexamined the article as an articleas a contribution to science, not as a third-party judgebecause a replicator doesn't have higher standingthan the original author.They don't have lower standing.They're just another contributor.

  • 19:57

    GARY KING, PHD [continued]: They probably have a little bit more than the original authorbecause they came second.Whoever comes second is probably goingto do better because they get to build on whoever came first.Well, in this case, this was not a groupthat was replicating one article, nor were theytrying to make an inference that the original article wasmaking.They were interested in a different inference.

  • 20:18

    GARY KING, PHD [continued]: They were interested in a specific number, whichis-- they were interested in a lot of numbers,but their basic question was-- whatpercentage of articles in psychological scienceare replicable?And we found that the way they madethat inference, which we were able to replicate, was flawed.They made statistical flaws that caused their inference

  • 20:40

    GARY KING, PHD [continued]: from the facts they had-- they replicated 100 articles,and they had facts about the replicability of those 100articles-- to the facts they didn't have,which is the replicability of all psychological science.And we found that the way that theytook the facts they had to get to the factsthat they didn't have was very seriously flawed.

  • 21:03

    GARY KING, PHD [continued]: They tried to calculate-- if you replicate 100 articles, someof them are not going to replicate just by chancealone, even if they're all perfect articles.Well, how many?So they calculated that about 8% would notreplicate by chance alone.And somewhat less than half they actually observed to replicate.

  • 21:23

    GARY KING, PHD [continued]: And they said, so therefore there'sa much bigger problem than how many would replicate by chancealone, so there's a crisis-- at least,that's how people interpreted their article.So we came along and said, well, howdid they actually calculate the number that would replicateby chance alone?Turns out that they made a mistake.They assumed that the only reason that an article wouldn't

  • 21:45

    GARY KING, PHD [continued]: replicate is that they got a bad draw or a different drawof research subjects.And we know there's variability from time to timeif you draw different research subjects, just like samplingvariability in a survey.However, in this case, the calculationonly applies if they use exactly the same procedures

  • 22:06

    GARY KING, PHD [continued]: as the original authors.If they change the procedure, then youwould expect even more variability.So what we did is we figured out a wayto estimate how much additional variabilitythey added by changing the procedures.And it turns out they drasticallychanged the procedures of many of their articles.So when we figured out how to estimate the extra variability

  • 22:29

    GARY KING, PHD [continued]: and how many would not replicate by chance alone wherechance was calculated correctly, wefound out that it was much higher than 8%.In fact, it was not far from the actual number thatdid replicate in their studies.So the implication is not that there'sno problem in psychological science.

  • 22:50

    GARY KING, PHD [continued]: The implication is that their study did not appropriatelyor correctly estimate the replicabilityof psychological science.If you read their study and you concludethere's a problem in psychological science,then you're probably wrong.You're drawing the wrong conclusions.That doesn't mean there's no problem.It just means that you can't learn it from their article.So the article that I wrote 21 years ago

  • 23:15

    GARY KING, PHD [continued]: called "Replication Replication" had 19 responsesin the same journal.Scientists are always a little worriedabout other scientists telling them how to do research.SAGE publishes work on methodology,which is basically all your authors-- what do they do?They tell other scientists how to do their research.

  • 23:36

    GARY KING, PHD [continued]: And so it creates lots of at least attention, and sometimescontroversy.If you tell people, you have to give upyour data, that's something you haveto pay very close attention to.Over the last 21 years, there's been enormous progress.When I said that everybody shouldmake their data available back then,

  • 23:56

    GARY KING, PHD [continued]: it was a firestorm of discussion and activity and objectionand you name it.And then over time, basically, the war has been won.Now it's at least embarrassing if youdon't make your data available.It's potentially actionable in many areas.If you got grounds from the federal government

  • 24:18

    GARY KING, PHD [continued]: and you don't make your data available,that's a very serious violation of the rules.In many journals, you must submit your data with it.And so this is not uniform.It shouldn't be uniform.There should be diversity of outlets where you shouldpublish-- I think, anyway.But that war's been mostly won.

  • 24:42

    GARY KING, PHD [continued]: So that was 1995.In 2005, I wrote an article called"Publication Publication" in which I wrote upthe assignments that I give my students for howto write a publishable paper, beginning with the replication.

  • 25:03

    GARY KING, PHD [continued]: And that also revisited the controversyfrom 10 years before.And even 10 years after the original article,there was enormous progress.And now it's 10 years after that,and there's been even more progress.And so that's really terrific, to see this kind of progress.

  • 25:24

    GARY KING, PHD [continued]: Let me mention one other thing.So people often think that when you replicate articles,the point of it is to just check on other people.And that is part of it, but that's not the point.That's not why you should be replicating other people'swork.The reason you should be replicating other people's work

  • 25:45

    GARY KING, PHD [continued]: is that it is the fastest way for youto get to the cutting edge.That's what we show in this article, "PublicationPublication"-- that if you are at the cutting edgeand you understand what the best article in your fielddid to study a particular subject, then you learnednot only what was in the article,not only what the data source was,

  • 26:06

    GARY KING, PHD [continued]: but all the little decisions that werereally crucial along the way to produce those results.Those little decisions you'd have to make up on your own.Now you understand what the best person in the literaturewriting the best article did.So now you're right up at the cutting edge.This is what we do in this paper in PS in 2005.You replicate a good article with very specific rules

  • 26:29

    GARY KING, PHD [continued]: about how to choose it.Once you're up to the cutting edgeand you figured out how to get there, at that point,you have to figure out how to get just a little bitbeyond the cutting edge.You can change the dependent variable.You can collect a little bit more data.You can decide that the actual thing that we're studyingis wrong and we should actually aim in a different direction.You can make a correction if you find a problem.

  • 26:53

    GARY KING, PHD [continued]: You can use a different methodological solution.You can change the quantity of interest.There's many things you can do.But you don't have to defend everything you've ever doneand everything that is in the articlebecause you can keep everything the same as in this publishedarticle and make only one change, ideally,or a small number of changes that you can defend.And if at that point you've learned something new,

  • 27:15

    GARY KING, PHD [continued]: then you get a publication.So that's incredibly valuable.So for you, that's valuable because it's a quick wayto get a publication.For the scientific community, it'svaluable because you might have found somethingthat the original author didn't notice,either because maybe the author made a mistake,but more likely, the author was focusedon the author's interests.And maybe you'll find or have found

  • 27:35

    GARY KING, PHD [continued]: something new and interesting that the author wasn'treally focused on.And it's much easier to defend your flank insteadof having to defend all of your different flanksall at the same time.So the main purpose of encouraging studentsto do replications is not to check up on other people.The main point of doing replications

  • 27:56

    GARY KING, PHD [continued]: is to advance the state of the art and to start where we are.If you only start in a place where no one's everbeen before, that's OK.That's a great thing.You can learn new things if you go into a different area.But to have the scientific community working with youand for you is incredibly valuable,so you might as well build on it.

  • 28:17

    GARY KING, PHD [continued]: What role do publishers play?So what do publishers do?Publishers make available a static imageof text and some pictures, usuallyjust in black and white, for-- that doesn't change.

  • 28:39

    GARY KING, PHD [continued]: And it doesn't include the data.And it doesn't include the code.And it doesn't include the thousandsof decisions that led to the quick summaryof the article that is being made available to everybody.So what can publishers do?They can start to make available much moreinformation than they're making now.They don't need to become data publishers,but they could link to data publishers.

  • 29:01

    GARY KING, PHD [continued]: So in Dataverse, we have ways of integrating with the journalsystems that exist now.And so if you have a system that accepts articles and makesit easy for editors and authors and reviewers,that can actually be integrated with Dataverse.And so when an author finally has their article acceptedand upload their article for final submission,

  • 29:26

    GARY KING, PHD [continued]: they can also upload their data set.And their data set then would be listedwith the publisher on the publisher's website, whichwould be a great thing for the publisher.That same data set can also be listed on the author's website.So the author can get credit.There'll be more citations for the author,more citations for the journal.The impact factor will go up.

  • 29:47

    GARY KING, PHD [continued]: The scientific community will get access to the data,and we'll all learn more.So publishers, even though they've alwaysbeen focused on text, they should alsobe focused on data and replication code now, too.There was an article not too long agothat studied those who make data available.Scholars who make data available are cited about twice as often

  • 30:10

    GARY KING, PHD [continued]: as scholars that don't.Journals that make data available are cited about threetimes as much as journals that don't.So I think would be beneficial to publishersto make data available.There are plenty of reasons why authors sometimescan't make their data available or needto make it available under certain restrictions.That's fine.

  • 30:30

    GARY KING, PHD [continued]: But there's always some data that can be made available,always.The fact of the data can be made available.The metadata describing what the data is can be made available.The UNF, the Universal Numeric Fingerprint,which is calculated from the data, is a one-way calculation.You can't go from the Universal Numeric Fingerprint

  • 30:51

    GARY KING, PHD [continued]: back to the data.That means if you have the most sensitive data you couldpossibly imagine-- national security secrets of the starsor whatever it is-- those data, you can calculate a UNF,provide the citation so other people know it exists.You can put the data in cold storage.If someone has received permission

  • 31:12

    GARY KING, PHD [continued]: to get access to it under very specific conditions,they can get access to it.And then they can run that algorithmand they can figure out whether theyhave exactly the same data that was used in your article sevenyears ago.How does replication work for qualitative researchers?The really interesting developmentin the social sciences is that the qualitative world

  • 31:36

    GARY KING, PHD [continued]: and the quantitative world are moving more and more together.So what we used to view as qualitative informationis actually becoming quantitative informationto some degree.So video and audio and field notes-- these are all actuallyactionable data now.And so the qualitative people are, of course,

  • 31:58

    GARY KING, PHD [continued]: overwhelmed with incredibly rich information.They can use some help.They don't want a fully automated solutionbecause it would do something dumb,like adding up all the number of characters they have.Who cares, right?But a fully human solution is impossiblebecause even if you collect 15,000pages of field notes, what are you supposed to do with that?

  • 32:19

    GARY KING, PHD [continued]: You flip through them, but you can't evenremember them, much less analyze them.So having some assistance would be great.As it happens, the quantitative worldhas developed some procedures to make those kindsof qualitative data actionable.In fact, it's sort of quantitative data.And the quantitative people need the qualitative people

  • 32:41

    GARY KING, PHD [continued]: because the qualitative people have the information.So what about replication in qualitative research?It's harder.It's more difficult because the data is much more personallyidentifiable.If you interview people and if you doethnographies and you do videotapes,you certainly are going to know who those people are.That's actually OK.

  • 33:01

    GARY KING, PHD [continued]: You don't take the data and you make them publicly available.You make it available.You make it archived.You put it in preservation formatand you keep it, such as in Dataverse.You put it behind a security layer,and under only certain conditions can others get it.As it happens, we know how to deal with data like this very,

  • 33:24

    GARY KING, PHD [continued]: very well.We have institutional review boardsthat review us and make us apply for the abilityto do the research in the first placeand to distribute the data in the second place.And so it's not that difficult in qualitative researchto now take data and not just make it available to everybody,

  • 33:45

    GARY KING, PHD [continued]: but to archive it at the time of publication,even if it's not going to be made available to everybodyall at once.[MUSIC PLAYING]

Video Info

Publisher: SAGE Publications Ltd

Publication Year: 2017

Video Type:Interview

Methods: Reliability, Validity, Data quality and data management, Communicating and disseminating research, Publishing your research

Keywords: challenges, issues, and controversies; crisis; fraud; practices, strategies, and tools; Science communication; Software ... Show More

Segment Info

Segment Num.: 1

Persons Discussed:

Events Discussed:



Gary King discusses the importance of replication in the sciences, including examining the barriers to replication and reasons studies may not replicate, tips for how to make data more shareable, and the role of publishers.

Gary King Discusses Replication in the Social Sciences

Gary King discusses the importance of replication in the sciences, including examining the barriers to replication and reasons studies may not replicate, tips for how to make data more shareable, and the role of publishers.

