Skip to main content
Search form
  • 00:00


  • 00:10

    GEORGE PLOUBIDIS: My name is George Ploubidis.I am Professor of Population, Health, and Statisticsat the Department of Social Scienceat University College, London--UCL.And I'm also Director of Research and Chief Statisticianat the Center for Longitudinal Studies at UCL.My work is mostly looking at social determinants of healthover the life course of certain demographic determinants

  • 00:30

    GEORGE PLOUBIDIS [continued]: and the mechanisms that link those with various healthoutcomes over the life course.[What is a longitudinal survey?]I think, to me, it's the best thing since sliced bread,the best thing ever.In the UK, we're very lucky because we have a lot of those.

  • 00:51

    GEORGE PLOUBIDIS [continued]: It's a unique UK thing, having a lot of longitudinal surveys.Simply, longitudinal surveys meansfollowing people over time.It's a very simple idea but a very powerful ideabecause it allows us to investigatechange, stability, in people's livesover long periods of time.

  • 01:13

    GEORGE PLOUBIDIS [continued]: Let's say a particular kind-- type-- of longitudinal surveythat I have very close to my heart are birth cohorts-- birthcohort studies.As the name implies, we follow people from birth and peopleover the whole life course.The UK is unique in that perspective.It's the only country in the worldthat we have four of those, four large, population-based birthcohorts that follow people from birth until later on.

  • 01:34

    GEORGE PLOUBIDIS [continued]: Of course, there are huge opportunitiesfor a lot of research from many disciplinesby using these data sets.The UK birth cohort studies are all population based.So they're large samples.Let's say the oldest one, those born 1946,there are about 5,000.The four later ones-- the 1958, 1970,Millennium Cohorts Survey, with about 18,000 participants.

  • 01:58

    GEORGE PLOUBIDIS [continued]: There isn't the selection every year, only in 1946 and 1958.The selection-- it wasn't a selection, really.It was very, very simple.You can think of it as a miracle of social engineering.Because let's say in a single week in March in 1958,90% of all babies that were born that single weekwere recruited for the 1958 birth cohort, for example.

  • 02:20

    GEORGE PLOUBIDIS [continued]: Since then, science has progressed a little bit,or a lot-- a lot, actually.We know that there may be some seasonal effects.So for example, the Millennium Cohort Studyis a complex survey.It recruits people around the whole year to, in a sense,to take into account the birth of month effect.

  • 02:44

    GEORGE PLOUBIDIS [continued]: Again the selection is done in a waythat allows us to have a representative sampleof the population.One of the great things about the studies,aside the data and the fact that theyfollow people over their life course,is that they are representative of the target population.That's a very important feature of all these studies.[What kind of measures or data points are captured

  • 03:04

    GEORGE PLOUBIDIS [continued]: in longitudinal studies?]The short answer is everything.The long answer is think of those studies as surveys,right, first and foremost.So there are a lot of questions about everything, really--education, health, income, occupation, relationships,

  • 03:27

    GEORGE PLOUBIDIS [continued]: everything a social survey you would thinkwould ask-- but everything, really.I'm sure I'm forgetting something now.But all the studies nowadays--not now but let's say since the late '90s--they're also complemented with biological data, as well.So we have blood samples.We have a lot of biomarkers.So these studies are fully genotyped, as well.

  • 03:48

    GEORGE PLOUBIDIS [continued]: We have epigenetic information.Nowadays, we're doing a lot of data linkageswith routine data--let's say, hospitalizations, data from schools,educational achievement, and so on.And there will be more.So it's really--I'm not sure if this is the right expression-- but a candyshop of data, really, a lot of data

  • 04:11

    GEORGE PLOUBIDIS [continued]: that allows us to answer all sorts of questions.[Can the data in birth cohort studies be considered'Big Data'?]They're the best form of big data.In a sense, when I mentioned about linkages, right,the idea is that we link the birth cohortdata with the big data, right?

  • 04:34

    GEORGE PLOUBIDIS [continued]: In a sense, it's the best of both worlds.Because with big data that are collected for other purposes--they're not collected for scientific purposes, right--so by bringing together the very refined information overthe life course in the birth cohort--and other longitudinal surveys, of course--with information with big data, really,

  • 04:55

    GEORGE PLOUBIDIS [continued]: is the best of both worlds.This is the direction of travel in the future.And that will allow us to get an even better picture, morerefined picture, of people's life course.So birth cohorts-- they're the best formof big data but, really, the best of both worlds, wherethe birth cohorts plus all the big administrative datawe're linking in to answer even more questions.I'll give an example from health,but there are many others.

  • 05:18

    GEORGE PLOUBIDIS [continued]: For example, in the birth cohorts,we have data on health--on self-reported health, right-- but also biomarkers and so on.But we also ask people whether they have went to see their GPor whether they have been hospitalized.But that self-report, by bringing in information,let's say, on hospitalizations--duration or number or the reason that people were hospitalized--

  • 05:39

    GEORGE PLOUBIDIS [continued]: that will give us more opportunitiesto actually investigate the determinantsof those hospitalizations or outcomesof those hospitalizations.So in a way, it brings more opportunitiesto answer more questions in a more refined waythat also will make these answers even more

  • 05:60

    GEORGE PLOUBIDIS [continued]: policy-relevant, as well.[What challenges are there when working with large scalelongitudinal studies?]Longitudinal surveys are amazing, of course,but they have challenges, as any data set.To me, there are three major challenges.Missing data is one of those challenges.Because we're doing our best to follow people

  • 06:21

    GEORGE PLOUBIDIS [continued]: over time, but people drop out on the studies.Causal inference.These are observational data.There are some assumptions but reasonable assumptions.I will argue about that later.And measurement error, of course.There's also this.We have to think about measurement error, as well.I will first talk about missing data a little bit

  • 06:43

    GEORGE PLOUBIDIS [continued]: because at the Center for Longitudinal Studieswhere I work, we have an applied statistical methods programthat deals with all these issues.What is great about those studiesis they're very rich information.And we have shown that by capitalizingon the richness of those studies,we're able to compensate for non-response.So simply put, early in life, almost everyone

  • 07:06

    GEORGE PLOUBIDIS [continued]: are in those birth cohort.There isn't much dropout, or not at all, actually.So we have all this rich information early on.By using this information and information we have inadulthood for, let's say, 60% or 70% of the participants,we can make very accurate predictionsof the people that are missing and canmake very accurate inferences despite that missingness.

  • 07:27

    GEORGE PLOUBIDIS [continued]: Under some assumptions, of course, with modern methods--let's say like multiple imputation,inverse probability weighting, full information maximumlikelihood, and so on--the key there is the richness of the birth cohorts.The assumption is that the observedthat we have is good enough for us to make accuratepredictions about missingness.And we have shown that, yes, that's the case.

  • 07:49

    GEORGE PLOUBIDIS [continued]: But of course, we have to think about itand do appropriate analysis for missing data.The other challenge, this causal inference,I think that's a big one, right?Because this is observational data.We cannot randomize.Even if we could randomize some things,let's say, it would be unethical to randomize.And also, if you want to investigate peopleover their life courses in their natural habitat--

  • 08:10

    GEORGE PLOUBIDIS [continued]: let's say the societies we live in--you wouldn't want to randomize.I'm not against randomization.I mean, it's great, absolutely.But it can answer some questions, notall the questions that we're interested by lookingat these type of longitudinal surveys, the birth cohortsand so on.Causal inference and observational dataare very interesting.It's a methodological area with a lot

  • 08:33

    GEORGE PLOUBIDIS [continued]: of work from various disciplines-- of course,statistics, epidemiology, economics, computer science,as well.And nowadays, we have methods available to usto either directly obtain causal estimates, methodslike, for example, instrumental variable modelsor when the instrument is using a polygenic risk

  • 08:56

    GEORGE PLOUBIDIS [continued]: score or a gene Mendelian randomization,for example, regression discontinuity, other methods,and other methods, as well, that have been described mostlyin econometric or epidemiological literature.What we can also do-- again, capitalizing on the richnessof the birth cohorts--aside, of course, very carefully controlling our models

  • 09:17

    GEORGE PLOUBIDIS [continued]: for all those observables, nowadays,there are very flexible methods to do sensitivity analysis.For example, to simulate the effect of something and measurethat you don't have in your data that would, of course, wouldbe a problem for causal inference, what would this--let's say, a measure confounder or omitted-variable--how strong its effect would have to beto explain away the results we have found from our studies.

  • 09:38

    GEORGE PLOUBIDIS [continued]: There are very flexible methods for sensitivity analysisto do this.There are also ideas around other methods of sensitivityanalysis, like negative controls,for example, and so on.The approach, the idea, is that in these observational data,the idea is to use different methodsfor the same question at hand.And you can think of it as a form of triangulation.

  • 10:00

    GEORGE PLOUBIDIS [continued]: This kind of term has become fashionable again.Epidemiology work has been aroundfor more years in other disciplines, too.So what we try to do in observational datain the cohorts and in other longitudinal surveys is, firstof all, yes, we're taking into account,we're using the rich information,and we control our models with all these rates, controls,and so on.But always there is either some form

  • 10:22

    GEORGE PLOUBIDIS [continued]: of sensitivity analysis for a measureconfounding or a formal causal inference method.And then combine all those estimates and identify.If they all agree, great.If they don't, we can think more aboutwhy there is a discrepancy and go back and think morerigorously about causal inference.I do believe there are a lot of methods that you can use

  • 10:43

    GEORGE PLOUBIDIS [continued]: to deal with measurement error.For example, you can think of latent variable models,item response theory, other factor analysis,but broadly, latent variable models that can be used.And these are used frequently in,let's say, longitudinal surveys or the cohortsto control as much as possible for measurement error.There are also ways to do sensitivity analysisto simulate the effects of measurement error.

  • 11:04

    GEORGE PLOUBIDIS [continued]: And this also possible in available software, as well.I would say measurement error is compared to missing data,and causal inference is the one that, with some assumptionswith known methods--actually, as the others, too-- but with known methods,we can actually compensate for various forms of measurement.Of course, there's more work thatneeds to be done pedagogically to combine methods to have

  • 11:27

    GEORGE PLOUBIDIS [continued]: control for all these three challenges in a single analysisand so on.But we need more content.[Can you give an example of your own work on birth cohortstudies?]There are various projects that I'm involved with.But one that is very close to my heart,and one that we've been working on currently,is on mental health over the life course,

  • 11:48

    GEORGE PLOUBIDIS [continued]: especially how early life mental health and trajectory-- soearly life mental health-- can influence outcomes laterin life.For example, using the 1958 birth cohort that has very wellcharacterized mental health, common mental disorders--like symptoms of anxiety and depression over the lifecourse, but also affective symptoms and contact problems

  • 12:09

    GEORGE PLOUBIDIS [continued]: in childhood at age seven, 11, and 16--we have the life trajectories of mental health in childhood.And we have investigated how theseare related to biomarkers, cardiometabolic riskfactors in midlife, and, of course, mortalityin men and women.And we have some interesting findings.For example, early life mental healthis strongly associated with biomarkers--

  • 12:31

    GEORGE PLOUBIDIS [continued]: mostly women but less so in men, which we find very interesting.And we would like to investigate the mechanism why this is so.Interestingly, early life mental healthis associated with mortality in both genders--as strongly in both genders.And this makes us think that early life mental health

  • 12:54

    GEORGE PLOUBIDIS [continued]: is one of the factors that might also explain the genderparadox and mortality, that women are less healthybut tend to live longer, on average.So the difference on early life mental healthmight be one of the factors that actually might explainthis well-known gender paradox in life expectancy and so on.Related to this project, what we have also done within this

  • 13:16

    GEORGE PLOUBIDIS [continued]: project, but in a second paper, is we used the 1946 birthcohort, the 1958 birth cohort, and the 1970 birth cohort--let's say the three adult birth cohorts--that have mental health information from adolescenceup to later on to investigate the evolution, the development,of, let's say, symptoms of depression, anxiety,psychological distress, if you like, over the life course

  • 13:38

    GEORGE PLOUBIDIS [continued]: and whether it peaks early on or in midlifeor later on and so on.And we have found-- and this is, actually,the first study that has looked at the empirical distributionof psychological distress over the life course--that it peaks in midlife in all those three cohorts.More symptoms, for both men and women,are in midlife, and that this is mostly--

  • 14:01

    GEORGE PLOUBIDIS [continued]: what we believe, this is mostly to somethingthat is called a cohort effect, that this cohort experiencesimilar, let's say, similar determinants,seeing similar experiences that, let's say,anxiety about their careers, family, divorce, failures,and so on during their midlife.

  • 14:23

    GEORGE PLOUBIDIS [continued]: And that leads to that peak in psychological distress.Interestingly, of course, becauseof the rise in life expectancy and the postponement--the well-known second demographic transition--the postponement of first birth and the ageof women when they have their first child, midlifeis a bit postponed, right, in terms of what we termis midlife, like is it the early 40s or in mid-40s or early 50s

  • 14:46

    GEORGE PLOUBIDIS [continued]: and so on.But in all these three generations,around midlife with respect to their life expectancy,is where is the peak psychological distress.[How do you analyze these data?]One thing I really, really like about birth cohortsand about data, as well, is that there isn't a single method.

  • 15:08

    GEORGE PLOUBIDIS [continued]: It's a combination of methods, and it'svery multidisciplinary, as well.Of course, when you talk about analysis-- statistics, right?Obviously.But there are methods--you can call them statistical methods or methodsdeveloped in other disciplines-- economics, epidemiology,sociology, psychology, as well.For example, most of the, let's say, most of the toolswe have on measurement error come from psychology, right,

  • 15:30

    GEORGE PLOUBIDIS [continued]: from psychometrics, obviously.So it's not a single method.For example, let's say in the paperthat we have investigated the association between early lifemental health and biomarkers and mortality.We combined trajectory type analysis, finite mixturemodels, with regressions-- linear and log-binomial

  • 15:51

    GEORGE PLOUBIDIS [continued]: regressions-- plus Cox survival models like timed event models.So you have a combination of different methods.And on top of that, there is missingdata analysis with multiple imputation for allthese methods, plus sensitivity analysis for causal inferencewith a evaluation approach with a method from biostatistics

  • 16:12

    GEORGE PLOUBIDIS [continued]: and epidemiology.So it's not a single method.There are various methods.For the second paper I mentioned on trajectories over their lifecourse and the peak in midlife, well,this was done with piecewise multilevel modelsto capture all the curves, the peaks, and the slowdowns,let's say, and the dips in mental health and so on.Again, yes, piecewise multilevel models but then

  • 16:37

    GEORGE PLOUBIDIS [continued]: multiple imputation and direct likelihood for missing data.So again, a combination of methods.So there isn't a single way.What's fascinating is that it's always a combination.It always will be combination of methods.And it always should be multidisciplinary.Being very narrow I don't think helpsto actually capitalize on the richness of these data sets.

  • 16:57

    GEORGE PLOUBIDIS [continued]: [Do you use any particular software or tool to help youmanage the data?]At UCL, we teach R. It's a great software.It's freely available.And actually, all the things I mentioned before,most of them--all of them, actually-- can be done in R.Having said that, I think it's a great skill

  • 17:18

    GEORGE PLOUBIDIS [continued]: to be able to be flexible with software.In my work, yes, I use Stata a lot, as well.I use Mplus.A lot of people also use SPSS but, of course, as I mentioned,R. SAS is also another well-known software.There are others, depending on specifics.For example, there is software, if you are interested,let's say, in genome-wide association studies,

  • 17:39

    GEORGE PLOUBIDIS [continued]: there's specific software for that and so on.I guess, let's say, for most of the workI do is Stata, Mplus and R 80% of the time.But I think it's always a good skillto be able to work with a couple of different softwares,if that makes sense.Because yes, they're great, the softwarethat can actually do most of the analysis you plan to do.

  • 18:01

    GEORGE PLOUBIDIS [continued]: But there will always be something else.So I guess my advice would be that learn one software reallywell.If that is R, that amazing.If it's something else, great--Stata, I don't know, something else-- great, as well.And then, once you know how to workwith a software out of those examples I gave,I think it's easy for most--

  • 18:23

    GEORGE PLOUBIDIS [continued]: it would be very easy to actually adaptto another software.So it's always, I think, I guess, in my work,I guess it's always a combination.If I could give an advice, I would definitely,at some point, get into R. Because if you know R,then everything else kind of easy, follows easily.But that's just me.[What other types of research questions can you answer using

  • 18:46

    GEORGE PLOUBIDIS [continued]: birth cohort studies?]Many questions.But I guess the birth cohort studies, especially,are very, very unique in the fact that they have informationfrom birth, from early on, and by followingpeople over the life course.For example, the first evidence ever that circumstancein their life-- for example, that social class,

  • 19:07

    GEORGE PLOUBIDIS [continued]: parental social class, parental occupation,parental education-- have a long-lasting impact on healthand social outcomes over the life course--40, 50, 60 years later on--comes from birth cohort.And this is kind of a very powerful, powerful finding,I guess, right?And thinking about it, I guess nowadays,we think, yes, that makes sense.But before the birth cohort, or before actually

  • 19:27

    GEORGE PLOUBIDIS [continued]: having the data to actually analyze this and investigatethis and have empirical evidence on that,that couldn't be possible.It's only possible because we have this great data.So generally, one area of work is the effect of early lifecharacteristics-- socioeconomic and others--on outcomes later in life--in early adult, midlife, old age, and so on.

  • 19:52

    GEORGE PLOUBIDIS [continued]: Another very important area that couldbe investigated mostly with birth cohorts,either within a birth cohort or comparing different birthcohorts, is social mobility, whether it'sdeclining or not and so on.And of course, there's international mobility.Then, the intergenerational transmissionof social status and health is one area that a lot of work

  • 20:15

    GEORGE PLOUBIDIS [continued]: has been done with the cohorts on these.But also, there are many, many more thingsthat should be done in order to understand the mechanisms thatunderlie social mobility and so on.Another very important area that, again, the cohortscan actually contribute to is the areaof intergenerational inequalities.For example, there was a recent report,

  • 20:36

    GEORGE PLOUBIDIS [continued]: I think from the Resolution Foundation,that shows that, for example, thoseborn in the '80s, or the millennials,are projected to have less wealth, less income, and soon in terms of pensions and everything.And that is a very important and unwanted formof intergenerational inequality.By using the birth cohorts, we willbe able to actually understand the mechanisms of that.

  • 20:57

    GEORGE PLOUBIDIS [continued]: So we have described the phenomenon.But the question is, OK, why?And what can we do about it?The cohorts can actually do that.Another area which, again, can onlybe done either with longitudinal studies or, especially,the birth cohort is work around resilience.Because sometimes not everybody follows the trend, right?

  • 21:19

    GEORGE PLOUBIDIS [continued]: Sometimes people buck the trend, in a sense.There may be some adverse circumstances early in life.But there are people-- and we haveobserved such participants in the cohorts-- that actually doas well as participants that haven't experiencedthis form of adversity.Again, this is a fascinating area of work

  • 21:40

    GEORGE PLOUBIDIS [continued]: where it's going to actually investigatewhat makes these people.What are the circumstances?What happened to them during their livesthat actually made them resilientto some particular adversity?So there is another very important area of workthat has been done in the cohorts,but there is still a lot of questionswe haven't answered yet.

  • 22:01

    GEORGE PLOUBIDIS [continued]: [What advice do you have for someone looking to work with][longitudinal and birth cohort data for the first time?]The first thing is that most of the studies I mentioned,they're freely available at the UK Data Service.So you can simply log in there and download the dataand see what is available, with all the documentation.

  • 22:21

    GEORGE PLOUBIDIS [continued]: So I would say that's a first stepto say, hey, have a look at what you have available.Of course, the website of the Center for LongitudinalStudies, where I work, where we hostthree birth cohorts and another cohort study called Next Steps,all the documentation is there, as well.But either the UK Data Service or the CLS website or other--

  • 22:45

    GEORGE PLOUBIDIS [continued]: for example, for other longitudinal surveys,their own websites, for example, direct youto the English Longitudinal Survey of Aging websiteand so on.What is great about the studies--and Understanding Society, which is another very importantlongitudinal survey--what is great about the UK is that these dataare freely available.And you can actually go to UK Data Service

  • 23:06

    GEORGE PLOUBIDIS [continued]: or through those websites and find all the documentationand the data, as well.So I would say that's kind of the first step.And you know, it's fascinating.I think the first time I got in touch many years agowith those surveys, I couldn't really believe it,the wealth of information and what kind of questionscould be answered and so on.

  • 23:26

    GEORGE PLOUBIDIS [continued]: So I would say go to the UK Data Service or to the CLS websiteand have a look at the data and the documentation.


George Ploubidis, PhD, Professor of Population, Health, and Statistics at the Department of Social Science, and Director of Research and Chief Statistician at the Center for Longitudinal Studies at University College, London, discusses longitudinal surveys, birth cohort studies, and big data including the kind of measures captured in longitudinal studies, whether birth-cohort studies are "big data", challenges working with large-scale longitudinal studies, an example of a birth-cohort study, how the data are analyzed, software or tools to manage the data, types of research questions that can be answered with these studies, and advice for someone interested in research using longitudinal and birth-cohort data.

Looks like you do not have access to this content.

An Introduction to Longitudinal Surveys, Birth Cohort Studies & Big Data

George Ploubidis, PhD, Professor of Population, Health, and Statistics at the Department of Social Science, and Director of Research and Chief Statistician at the Center for Longitudinal Studies at University College, London, discusses longitudinal surveys, birth cohort studies, and big data including the kind of measures captured in longitudinal studies, whether birth-cohort studies are "big data", challenges working with large-scale longitudinal studies, an example of a birth-cohort study, how the data are analyzed, software or tools to manage the data, types of research questions that can be answered with these studies, and advice for someone interested in research using longitudinal and birth-cohort data.

Copy and paste the following HTML into your website