Skip to main content
Search form
  • 00:01

    [MUSIC PLAYING]My name's Louise Corti.I'm one of the associate directorshere at the Data Archive, and I'mcollections development and producer relations director, aswell, for the UK Data Service.

  • 00:27

    The data archive is a resource thatholds many, many thousands of digital resourcesfor social scientists, and that's mostlydata sets that relate to surveys that are carried outby the government or by researchers,qualitative data that include interviews collectedin the field, or other historical resources.We have about six a half thousand of these.

  • 00:49

    We've been set up for over 50 years.So the Research Council, the ESLC,has funded us since 1967 to set upan archive to look after these very precious data.In the early days this was mostlypolling data, opinion polling data,but now we have a much wider portfolio.So I want to explain today, really,what's involved in getting these data in, for example,

  • 01:09

    from the government departments, and whatsteps we go through to make these dataavailable at the other end so that users and policymakers canaccess them.So I'm going to start off talking aboutwhat's involved in collections development,and that's the process by which we negotiate with people whoown data to think about how they want to make dataavailable to users, and then what conditions theyneed to sign up to.

  • 01:30

    So we're going to go and talk about that now.Hello, Gundi.Hi, Louise.How are you?I'm very well, thank you.Fine.So we're just going to talk about depositing data,we're here to discuss.So I just thought I'd just outlinethe kind of things we're going to cover in the meetingso you're clear.So first of all, we want to hear a bit about the dataand what they look like.Second, really, about the ownership

  • 01:50

    of the data, because one thing we'll need to dois establish the rights and get the license formssigned, that means we can then make it available to users.The next bit will be really about any restrictionson access, who the users might be, also the kind of time framethat you'd like the data made available.And then finally, really, about what kind of documentationyou'll be able to give us.

  • 02:11

    So first of all, can you just us a little bit about the studyand the data?Actually, I'm working for the Institute for Socialand Economic Research and we've been commissionedby the Economic and Social Research Councilover the past more than 20 years,really, to collect household panel data for the UnitedKingdom.OK, so in the sense of ownership and rights,

  • 02:32

    we have to have a license signed so we can makedata available to our users.Who's the owner of these data?Rights are very important when we'retalking about data because if I'm a principleinvestigator on a project, I pretty much ownthe copyright in what I do.If I'm interviewing people in the course of my project,they also have a right in their words.So sometimes rights can be quite complex,

  • 02:52

    and you can have multiple rights owners for bits of data.So what's important is that we establish,when we negotiate with depositors,exactly who holds the rights.And once we've established that, they will give usa non-exclusive license to redistribute data,and that means we can make data available,but we don't own the data.They keep the ownership, but we have a license to pass it on.

  • 03:14

    But the rights side is very important, to clarifywho owns what and what they can do with it.But the data owner is the data custodian, as we refer to him,and this would be the principal investigatorof the study, which in our case is Professor Nick Bock.What kind of users might use this?Is there a particular restrictions on access,or can students use it?Undergraduates?

  • 03:34

    Professors?Researchers?Who would be using this?The only restriction on access is reallythat the data can only be used for non-commercial purposes.The main uses of the data that we hold at the Archiveare really in two strands.One is the research, so people will download data sets

  • 03:56

    or access them, and they'll use themfor either predicting things or answering questionsthat they're posing, or they may be answeringkind of policy questions, they may be studying data over timeto predict what might happen.So what's gone on, what might happen.And the other side, really, is in training.A lot of students use data in their dissertationsjust to kind of practice using data,

  • 04:18

    or to do a very small scale piece of analysisto kind of test out their secondary analysis skills.So it really falls between teaching and research.So the various people that use the Archivetend to be researchers on the whole.They can be junior researches, theycan be PhDs, or undergraduates whoare working on dissertations.They can be experienced researchers

  • 04:40

    who want to use secondary data, for example census data,or they could be, really, anyone that's interested in findingout information, social economic information.So we have quite a lot of policy makers, local governmentresearchers, people kind of assembling factsfor their jobs, as well.So, finally, really, what kind of documentation willyou provide to help users understandwhat the data look like?

  • 05:02

    Now, in our case the documentation of the studyis rather complicated because it'sa longitudinal study so we interviewthe same household again, and again, and again over time.One of the biggest problems is transparency,and that means how much context can youprovide for some research that you've done.So thinking about that, some peopleargue that you can never have a full context of a data

  • 05:24

    piece, a data collection.An Example might be if you've interviewed someone in a room,you can get an audio recording, and you can transcribe that,and you could have a description of the setting in which theyinterview took place, but you can never actually be there.So it's very hard to recreate that contextabout the interaction and the nuances that might go on.So you can try to explain as much as possible,

  • 05:45

    but I guess you can never be fully transparent.For each of the data files that we submit to the Data Archive,we will then have a description of that specific data fileso that users know which of the studiesto-- or data files to put into their basket and douse for their research.OK, that sounds really good.

  • 06:07

    So hopefully you'll be sending your data to us quite soon,and we'll get the license signed by the principle investigatorin the next week or so.Thank you very much.Thank you.A sharable data set would look like-- well, it's digital,so it doesn't look like very much, but on a screenit would look like a list of data files organizedvery well into folders.So you can-- and the file naming's very clear.

  • 06:29

    So if you've done a set of 10 interviews,you'd have interview one, interview two,so it's kind of transparent and you couldsee what data files are there.When you open a data file in Word,you can see that the typeface works,and you indicate who's speaking.So there's some very obvious thingsabout what a good piece of data looks like.On the other hand, that piece of data needs be contextualized.

  • 06:49

    So it needs good documentation to say who was interviewed,at what date they were interviewed,and why they were interviewed, and to maybeprovide a topic guide or some informationabout why the research took place,and kind of a little bit about the interview setting, as well.The first point of contact that the data has with the Archive

  • 07:12

    is actually with my team, the Collections Development team,and we work with depositors who own data to discuss the termsand conditions under which data can be made available,and sometimes that can be quite a long process thatneeds to be negotiated because they may be uncertain about howthe data could be used.So just to give a bit of an example,we work very closely with government departmentswho collect some of them the best data there are, actually.

  • 07:34

    So we're very lucky in the UK to havesome really important government surveys, like the British CrimeSurvey, the Labor Force Survey, and of course, werun a full census.They're very, very high quality data sets,and we have an agreement with them that we've had sincethe 1970s to actually bring the data here .So a constant stream of data with governmentdepartments, to bring the data here.So we don't really need to persuade them to give us data.

  • 07:56

    We have an agreement with them, it'sjust the actual terms and conditions for each data set.On the other hand, we also work with many researchers whohave research grants, and everyone who has an ESLC grantactually has to offer data at the end of their award.So if they'd had an award for three years,after three months of finishing that they haveto make the data available.And that was really only introduced in the 1990s,

  • 08:19

    and it was difficult for them because many researchers arenot used to sharing their data, but now kind of 20 years laterit's a little bit more accepted.So, as well as working with them-- and, actually,we have what's called a self-deposit archive, wherethey can upload data themselves and documentation--we spend a lot of time training them, and providing guidancefor them on how to collect really high quality data, what

  • 08:40

    are the barriers to sharing data, how do you document data,and how do you think about things for the longer term.Some of the ways we support researchers and studentsin trying to help them think about how to look after dataand shared data is by all sorts of training materials here,and what we've done is made some very pragmatic exercises

  • 09:04

    for people to think about.So the first thing will be, if you're a PhD studentand you're carrying out a piece of research,you need to think beyond just collecting the data,analyzing it.You may be panicking a little bit,but just think long-term about how you'regoing to collect the data and where they'regoing to be shared or not.And there's many things you need to think about along the way.So we ask students and researchers think

  • 09:24

    about this planning a life cycle and the points at which they'redoing things like writing a consent form,collecting data, preparing data, the sorts of thingsthey might think about to make sure it's done consistently.So just an example here is to thinkabout the kinds of wording you mighthave in your consent form.So quite often people say, when you're interviewing someone,

  • 09:45

    it'll only be me that'll see the data.Don't worry.But actually, that precludes data sharing.So you need to think about a good way of explaininghow the data will be kept safe, but other researcherscan use it.So the consent form is very, very important.A second important thing is sort of consistencyin data collection.So if I'm interviewing 50 people,I want to make sure the transcriptions are

  • 10:05

    pretty much the same across all of the people I'm interviewing.So we have what's called a transcription template,so that if you've got multiple people transcribingan audio interview, they're going to use the same template,and it just makes life a lot easier,and you've got the same rules for transcription.The other thing that's really importantis that people think about documenting their data.So all the methods you use, the way that you sampled,

  • 10:29

    the questions you asked, rather than just thinkingabout that right at the end, to actually make notesas you go along about how you did the process of research.And actually PhDs in their final PhDwill need to write that stuff up in detail.But as you get through your career,more experienced researchers will notspend as much time writing the methods up.Yet, this is really important if you're

  • 10:49

    going to reuse data to find out what people did.And quite often in a publication,you will just see one short paragraph on methods,yet we'd really like people who deposit data to actually writemore about what they did.So some of these small issues are very, very important.And finally, I think the last thingis about worrying about our data, be caring about our data

  • 11:10

    and where they go.So for example, we do see people emailing an interviewtranscript to a colleague.That might have very personal data in it,and so one thinks about the internet is not a safe placeto share things, one should use encryptions to try and makesure the data are safe.So it's just thinking a little bit morecarefully about storing data safely that we've collected.The key challenges for a depositor

  • 11:31

    really are that it's very hard to explainabsolutely everything you've done along the researchprocess.So for example, if you think about governmentsocial surveys, they provide a very detailed technical reportwith everything they've done on sampling and data collection,the way they've coded data.But actually that's quite routine for peoplewho work in universities.

  • 11:51

    They tend not to have very much timeto document absolutely everything.So the key challenge really is how much timeyou've got to prepare data and describe it in detail,so someone coming to data are fresh and knowswhat's in the collection.[MUSIC PLAYING]


Louise Corti explains her work as an associate director of the Data Archive. The data archive acquires data collected by social science researchers and makes it available to students and other researchers. It also trains new researchers on good research and data management practices.

Looks like you do not have access to this content.

Managing Data: The UK Data Archive

Louise Corti explains her work as an associate director of the Data Archive. The data archive acquires data collected by social science researchers and makes it available to students and other researchers. It also trains new researchers on good research and data management practices.