- 00:05
[An Introduction to the Basics of Multilevel Modeling]

- 00:12
EC HEDBERG: Hi, I'm E.C Hedberg, and I'm an affiliated at NORCat the University of Chicago.And I'm also an assistant professor at Arizona StateUniversity. [E.C. HedBerg, Senior Research Scientist]And today, we're going to be talking about basicsof multilevel modeling.In this video, we're going to talkabout exactly what's estimated in a multilevel model.We're going to talk about how to write downan interpret multilevel model, and then we'll

- 00:34
EC HEDBERG [continued]: also along the way talk about how we cantest some of these parameters.Now a multilevel models go by various names,like are called sometimes mixed effects models, which willfind out in a little bit why.They're also called hierarchical linear models,which is sometimes the more popular name a lot of educationresearch settings.But it's important also tell you what we're not going to cover,because we have limited time.

- 00:55
EC HEDBERG [continued]: We're not going to talk about more than two levels.Multilevels models can be three, four, five, as many levelsas you have the data for, but we're not goingto talk about exactly those.But the concepts of what we're going to talk about todaywill generalize beyond.We're not going to talk about nonlinear outcomes.So our outcomes are going to be nice,normally distributed test scores.We're not going to talk about logits,

- 01:16
EC HEDBERG [continued]: and probits, and various other types of multilevel modeling.We're not going to talk about centering,yet center is a very important part.And then finally, we're not goingto talk about a lot of the assumptions.So when we talk about the further reading,really dig into those materials to understandexactly what the assumptions are in these multilevel models.So our discussion is going to be centered

- 01:36
EC HEDBERG [continued]: around an analysis of the early childhood longitudinal studykindergarten cohort of 1998.We're going to be talking about their spring kindergarten mathscores.On the screen, you see a nice little histogram.This is the distribution of those math scores.On average, the kids scored about 36with a standard deviation of about 12.One of our big predictors of interest

- 01:58
EC HEDBERG [continued]: is going to be the parents highest education level.I took the liberty of recoding the categorical valuesinto years of education.And so we have a mean of about 14 years of education.That's of community college or a technical degree,and a standard deviation of about 2.5.

- 02:19
EC HEDBERG [continued]: [What do multilevel models estimate?]So let's first talk about exactly whatmultilevel models estimate.What are you going to get in this output?So let's consider the following mixed effects model.We have an outcome, which is equal to an intercept, a fixedintercept beta 0.

- 02:40
EC HEDBERG [continued]: And then we have a random effect, the differencein those intercept, which is a sub j,and then we have a fixed slope for the covariate, beta 1 Xij.And then we have a random effect on that slope, which is CjXij,and then, of course, we have the within group residual eij.Now what's important to understandis, first, that these are mixed models.

- 03:02
EC HEDBERG [continued]: And a mixed model, again, combines both these fixedand random effects.Now, in most notation, a fixed effectis written down as a Greek letter.And so the betas are the fixed effects.So if you need a little mnemonic device,you can think f comes before g, and so fixed comes Greek.And so that's a way to remember that.

- 03:23
EC HEDBERG [continued]: And the random effects are symbolized with Roman letters.So the A's and the C's.And so again, the mnemonic devices, R is for Roman,and R is also for random.So what exactly do multilevel models estimate?Well, the best way to think about itis all the fixed effects, you will get a meanor a difference between means.

- 03:44
EC HEDBERG [continued]: So a mean would be saying intercept or a slopeis really the difference between the meansas you move along the covariance.Now for the random effects, you'renot going to get estimates of the mean, but what you will getare estimates of the variances.So you'll get a variance of aj, you'll get a variance of Cj.And so that's what the output for multilevel models

- 04:05
EC HEDBERG [continued]: are going to give you.And we'll go through this in many, many tablesin the upcoming minutes of this video.So let's briefly talk about exactly howto estimate a multilevel model.Unfortunately, unlike OLS, it's not a closed form solution,and so we'll have to use maximum likelihood.But what's important with multilevel modelsis to realize that there's two versions of maximum likelihood

- 04:27
EC HEDBERG [continued]: that are most often used.First, we have full maximum likelihood,and this is the default in some statistical applications,but not necessarily the default in others.Full maximum likelihood will give yougood estimates of the fixed effects,and it'll give you good estimatesof the random effects.However, those random effects willbe a little different from say in a ANOVA model,

- 04:48
EC HEDBERG [continued]: because they don't take into account the actual numberof groups you're dealing with.So we have, instead, restricted maximum likelihood.And a restricted maximum likelihood modelwill give you the same answers, as say, an ANOVA, a two-way,a three-way ANOVA, and sometimes that's morepreferable to some researchers.So it's important to sort of understand that there's

- 05:09
EC HEDBERG [continued]: a difference between the two.Both will give you the same fixed effects.So they'll give you the same intercepts, the same slopes,and all the rest of that, but they'llgive you slightly different estimates of the variancesof the random effects.So we're also going to talk about how to write downa multilevel model, and there's primarily two different waysto do this.One is the HLM notation.And HLM here stands for Hierarchical Linear Model.

- 05:31
EC HEDBERG [continued]: In these forms of a model, you break out the different levelsin the equation.So you have a level one equation.You have a level two equation and so forth.Now sometimes that can become a little unrulywhen you have a lot of different covariantsand a lot of different levels.And so other people, sometimes, prefer what'scalled the mixed notation.And the mixed notation basically combines all the levels

- 05:52
EC HEDBERG [continued]: into one regression line, and you can sort of clearlysee what the fixed and random effects are.Well, in this video, to sort of serve most audiences,we're going to go back and forth between these twodifferent notations.[Unconditional ANOVA Model]So let's start with the most basic of multilevel models,

- 06:12
EC HEDBERG [continued]: the unconditional ANOVA model.Now again, we call it ANOVA, because it'sanalysis of variance.So it's similar to the analysis of variancethat you learned in your Stats101 class.So let's write this model down in the HLM notation.So we have Yij, which is equal to beta 0j.Now that means that the beta 0j is the mean,

- 06:34
EC HEDBERG [continued]: but we have a different mean for each group.Now once you have the mean for each group,there's a residual, which is the within group residual eij.Now this is where the hierarchical part comes in.We say that each group has its own mean, being beta 0j,which is equal to an overall mean, gamma 00,and that each group differs from the overall mean,

- 06:56
EC HEDBERG [continued]: and so it's U0j is the random effect.And so here the next slide, I sort ofbreak out what are the random and what are the fixed effects.Now you may be asking yourself, beta 0jis a Greek letter, so why isn't that a fixed effect?Well, beta 0j is kind just a placeholder.Because when we move to the mixed notation,

- 07:18
EC HEDBERG [continued]: where we replace beta 0j with what the equation is for beta0j, we see that it drops out and the model in mixed notation isreally Yij is equal to gamma 00, which is the overall mean plusthe group random effect U0j plus within group residual the eij.

- 07:38
EC HEDBERG [continued]: And again, since the Greek letters are fixed,we say this is the fixed portion of the model.And then the Roman letters are the random.And we say these are the random portions of the model.So now that we have our mixed model,Yij is equal to an overall mean gamma 00.And each group has a mean that differs from this overall mean,and that's represented by the U0j.

- 07:58
EC HEDBERG [continued]: And then within each group, each casediffers from the group mean with eij.Now we have variance components, because the overall variancebreaks down into the variance between the groupsand the variance within the groups.And so Uoj is normally distributed with a mean of 0and variance of sigma squared b. b is for between groups.

- 08:19
EC HEDBERG [continued]: And then within each group, eij is normallydistributed with a mean of 0 and a variance sigmasquared w, w for within groups.And so the total variation is broken downinto the between group variance, sigmasquared b, and the within group variance, sigma squared w.So let's look at this ECLS data.

- 08:40
EC HEDBERG [continued]: And this is estimates for an unconditional ANOVA model.We see that the tables broken outwith a fixed portion and a random portion.And let's look at the intercept and the fixed portion gamma 00.We see that the average of the school average test scoresis 36.142.I put the standard error there in the parentheses.

- 09:01
EC HEDBERG [continued]: Now that we have an estimate of the interceptand a standard error in the parentheses, let's talkabout how to test whether or not this intercept is statisticallydifferent than 0.There's a couple ways to go about this,and different software will give you different options.The HLM software, for example, willuse various degrees of freedom and use a T-test.STATA, as another, will use a Wald test on a z distribution.

- 09:25
EC HEDBERG [continued]: But more or less, you can be pretty confidentthat it's statistically significantif the ratio of the fixed effect to its standard erroris 2 or greater.So in this case, obviously 36 divided by 0.2is far greater than 2.And so this is a statistically significant result.Next we have random effects.The variance of U0j, again, is the variance

- 09:48
EC HEDBERG [continued]: of how the means differ from the overall average.And so here the variance is about 33.011.And then, again, we have within group variance,and this is shown on the table as variance eij, which we callthe residual, which is 112.319.And then like any other maximum likelihood model,

- 10:08
EC HEDBERG [continued]: we have a log likelihood.And so this is a table of the ANOVA model.Now from this table, we can calculatewhat's known as the class correlation, whichis how different units within the same groupcorrelate with each other.And again, the ICC is equal to the ratioof the variance of U0j to the total variance,

- 10:28
EC HEDBERG [continued]: which is the variance of U0j, and the variance of eij.And in this case, the Intraclass Correlationis equal to about 0.227.So that means that the students arecorrelated with each other within the same schoolsby about 0.227.[Model with Level 1 Covariate With Fixed Slope]

- 10:51
EC HEDBERG [continued]: So now that we've talked about the unconditional ANOVA model,let's now move to adding a covariate.Now let's add a covariate at level one,at the student level with a fixed slope.This is how we write this model down.We say that Yij is equal to an intercept beta 0j, whichis unique to each group, plus the slope beta 1j, which

- 11:13
EC HEDBERG [continued]: is multiplied by the covariate Xij,and then we have the within group residual.Now like before in the ANOVA the model,beta 0j is just a placeholder for overall average gamma 00plus each groups difference in the means U0j.Now the reason why we say it's a fixed slope for the outcomeis we see that beta 1j is equal to just gamma 1, 0.

- 11:37
EC HEDBERG [continued]: No random effect, no residual, whichmeans we're constraining this modelto have the same slope for x across allthe different groups.Here's that same model in the next notation.Again, replacing the betas with the gammas,and we see that Yij is equal to the overall average gamma 00plus gamma 1, 0 times the covariate Xij.

- 11:60
EC HEDBERG [continued]: And then we have the random effectsof the difference in the means for the groups, U0j,plus the within group residual.Now going to the ECLS data, theseare the estimates for our model with a level onecovariate with a fixed slope.One thing to bear in mind is that parent educationis not centered at 0, but is centeredon 12 years of education.And what that means is that the intercept of 33.2, this

- 12:23
EC HEDBERG [continued]: is the average of the school averages for studentswith parents that have a high school education.That's about 33 points on that math test score.Now there's a benefit to having parentswith higher levels of education.In fact, for each year of education,kindergartners math score goes upby 1.483 points or almost a point and a half.

- 12:44
EC HEDBERG [continued]: So again, as parents become more educated, their students,their kindergarten students do better in math.Looking again, at the random effects,we see the variance of U0j, this is the differencein the school means from each other,as a variance of about 17.543.And again, this within group residualis a little bit smaller than what we saw before,

- 13:06
EC HEDBERG [continued]: where it's now at 106.076[Model With Level 1 Covariate With a Random Slope]So next, let's let this covariance,this effective parent education randomlyvary across the groups, which means each group, in theory,gets its own unique slope for parent education.

- 13:28
EC HEDBERG [continued]: So let's write this model down.Very similar to before, we have the level one model with Yijis equal to beta 0j.And then we have a slope for the covariate Xij,which is beta ij.And again, we have this within group residual eij.Next, we move to the level two portion.Beta 0j is equal to gamma 00 plus the random effect

- 13:51
EC HEDBERG [continued]: for the intercept Uoj.And now the slope for the covariate levelone beta 1j is equal to an average slope gamma 1, 0,but now each group has a different slope.And so there's a difference between say group oneslope and the overall average slope, and that's U1j.

- 14:13
EC HEDBERG [continued]: So now that we have two random effects at level two.We have to think not only about the varianceof those random facts, but also the covariance.And so the covariance of U1j and U0jtells us how the slope and the intercept of this modelrelate to each other.So we can think about a correlation actually

- 14:35
EC HEDBERG [continued]: between the group level intercept and the group levelslope.And so the formula to change a covariance into a correlationis simply just take the covarianceand divide it by the square of the product of the twovariances.And so this will be handy formula in just a secondSo again, we can move from the HLM notationto the mixed notation, again, by replacing the betas.

- 14:57
EC HEDBERG [continued]: Beta 0j just gets swapped out for that level two equationgamma 00 plus Uoj.And then we replace the beta 1J with the formulafor beta 1j, which is gamma 1, 0 plus U1j times the covariant x.So we have to distribute that covariante x across those twoterms, which leaves us with the following mixed effect model.

- 15:21
EC HEDBERG [continued]: We see that Yij is equal to the overall average gamma 00or the intercept plus the slope for x gamma 1,0 times the covariate Xij.Then we have a random effect for the intercept, U0j.But then we also have a random effect for the covariate U1jtimes Xij.

- 15:42
EC HEDBERG [continued]: Now this random effect for the slopehas a very specific function.We call it the heteroskedastic error.And the reason why we call the heteroskedatic error is,as you may recall from your regression classes,heteroskedasticity can be a bit of a problemwhen the variance differs based on the valueof another variable.Well, in these multilevel models,

- 16:03
EC HEDBERG [continued]: we can take that into account directly,because here we see that one of the error terms U0jis directly proportional to the valueone of the covariates Xij.And again, just to remind everybody, we have a fixed,and we have a random portion of the model.The fixed portion of the model is the gamma 00 intercept timesthe slope gamma 1j times the covariate Xij.

- 16:27
EC HEDBERG [continued]: And then the rest of it is the random.The U's and the e's are all Roman letters,And so they're random variables.So let's look at the results from our ECLS analysis.Again, our intercept or sort averageof the school averages for their studentswith parents with a high school education is about 33.Again, we have about a point half increase

- 16:47
EC HEDBERG [continued]: based on each year of a parent education.And then we have the variance of the interceptof the random effects portion, which is about 13.8.And then we have the variance of the slope of parent education,which seems small, about 0.254, which this tells methat, well, there is some play.There are some differences in the slope for parent education

- 17:08
EC HEDBERG [continued]: across schools.They're not wildly different.And then finally, we have a covariancebetween the intercepts and parents education.And it's positive, which tells meas the school average goes up, so does the benefit of parentseducation.Now we can take this covariance and turn it into a correlationwith the formula I just showed you.

- 17:29
EC HEDBERG [continued]: And in this case, the correlationbetween the schools intercept and the schoolsaffect for parent education is about 0.257.So again, this tells us that thereis sort of a relationship between how well a school isdoing overall and the benefits that the studentsin those schools get from having parents with higher

- 17:50
EC HEDBERG [continued]: levels of education.Now we have to talk about we've addeda bunch of various components to this model, was it a good idea?Did it improve the fit of a model?So typical parameters standard error tests,like Wald tests or T-tests, are just notadvised when you want to see whether a random effect isstatistically different than 0.

- 18:11
EC HEDBERG [continued]: And that's simply because you can't have a negative variance.So it's better to compare the loglikelihoods of what we call nested modelswith deviance statistics.Now this requires the use of full maximum likelihoodestimation.So we can't use the restricted maximum likelihood information.So make sure that in your software,you're using the full maximum likelihood estimation.

- 18:33
EC HEDBERG [continued]: And also, be sure when you compare these different modelsthat you're using exactly the same case space.So we have to calculate what we call a deviant statistic, whichis negative 2 times the log likelihood, whichwe get from all the output in a statistical package,and then we have two nested models.In this case, model 0 is the model

- 18:54
EC HEDBERG [continued]: with the fixed effect for the level one covariant parenteducation, and then model 1 adds the random effectfor the parent education effect.And so you see that we have those extra usedfor the extra random effects, and then wehave some covariance.So what we do is we calculate the test, whichis the deviance of the model 0 minus the deviance of the model

- 19:16
EC HEDBERG [continued]: 1.This test happens to be chi-squared just distributedwith degrees of freedom p, which isthe number of new parameters.In this case, the numbers of new parametersare 2, because not only are we gettinga variance for the slope, but we're alsogetting the covariance between the slope and intercept.So going through all these calculations,here are the results.The difference between the two deviance statistics

- 19:39
EC HEDBERG [continued]: is about 56.Now again, because we added two parameters,we have to look a chi-square tablewith two degrees of freedom.Now, I'm happy to tell you that this value is about 10times that critical value.So this is highly statistically significant, meaningadding those random effects significantly improved

- 19:60
EC HEDBERG [continued]: the fit of our model.[Model With Level 2 Covariate]So next, let's take a break from parent education,and let's talk about a level two covariate.Now a level two covariate is something that varies onlyat the second level.So it's constant for all the students or only unitswithin each group.And so the model in the level two covariate, in this case,

- 20:23
EC HEDBERG [continued]: we'll say whether or not the school is a private school.So here is the HLM notation model,which is equal to Yij is equal to the intercept beta 0j.And then we just have within group residual,but now beta 0j has its own intercept gamma 00 and a slopetimes the level two effect, gamma 0, 1 times Wj.

- 20:47
EC HEDBERG [continued]: And then again, we have a random effect, U0j.We could easily convert this into the mixed model notation,which is Yij is equal to gamma 00 times gamma 0,1 times the level two variable W times the random effect to U0jwith the within group residual.Here are the results from the model with ECLS.

- 21:07
EC HEDBERG [continued]: So we see that, again, the intercept is about 35,and the effect of the school being a private schoolis substantial.Kids in private schools, their math scoresare about 5 and 1/2 points higher than the kidsin the public schools.And again, we have the variances of the random effects.So the intercept variance is about 27,and then the variance of the [INAUDIBLE] residuals

- 21:29
EC HEDBERG [continued]: is very similar to the ANOVA model at 112.328.[Model With Level 1 Covariate With Fixed Slope and Level 2Covariate]So let's combine these models.We've seen the model with a level two effect.We've seen a model with a level one effect.Now let's talk about a model with both a level one

- 21:50
EC HEDBERG [continued]: covariate.We'll keep the slope fixed for the moment,and then we'll add a level two covariate.So now let's talk about HLM notation for this model.Yij is equal to beta 0j plus the slope beta 1j timesX with the within group residual.And the intercept is, again, equal to gamma 00 plusthe effect of the level two covariate gamma 0,1

- 22:11
EC HEDBERG [continued]: and the random effect.And again, the slope for the level one effect beta 1,beta 0j is just equal to gamma 1, 0.Here's the mixed model notation.It looks very similar.We just added the level two effect.Gamma 0, 1 plus level one effect,gamma 1, 0 into the mixed model notation.

- 22:32
EC HEDBERG [continued]: Here are the results, very similar as before.Again, we see that the private school kids do a lot betterat about 3 points more.And again, we have a benefit of parental educationat about 1.43.And again, we have variances of the random effects.[Model With Level 1 Covariate With Random Slope and Level 2Covariate]

- 22:55
EC HEDBERG [continued]: So here's the HLM notation for this model.Again, we have the level one model,which is beta 0j plus beta 1j times the covariate with thewithin group residual.We have the intercept model with the level two variable.And then we have the level one slopefor Xij beta 1j, which is equal to gamma 1, 0plus a random effect gamma 1j.

- 23:18
EC HEDBERG [continued]: So here we move from the HLM notation to the mixed notation.And there's a lot here, but everything herewas is what was in the HLM notation.Again, we have an intercept.We have slopes for the covariates.And then we have the random effects, includingthe heteroskedastic error.Here are the results from the model.Again, the intercept is equal to about 32.49,

- 23:40
EC HEDBERG [continued]: which is very similar to all the intercepts we've seen before.We have the benefit of private school education,which is about 3 points.We have the benefit to having parentswith high levels of education about 1.4 points for every yearof education, and then we have a variancefor the intercept, a variance for the effective parenteducation.And then we have a relationship between the intercept

- 24:02
EC HEDBERG [continued]: and the effect of parent education across the school.[Model With Level 1 Covariate With Fixed Slope, Level 2Covariate, and Interaction Between Level 1 and Level 2Covariate]With this model, a model with a level one covariate,with a fixed slope, a level two covariate, and an interactionbetween the level and a level two covariate

- 24:22
EC HEDBERG [continued]: may introduce what's called a cross-level effect.The reason why we call it a cross-level effect isbecause in this HLM model, we see that the level twovariable, WWj, is now not only predicting the intercept beta0j is also predicting the slope of x beta 1j.

- 24:43
EC HEDBERG [continued]: Now the reason why we call this an interaction on topof a cross-level effect is because once we break the HLMnotation into the mixed model notation,we see that beta 1j is really an intercept gamma 0-- gamma 1, 0plus the slope gamma 1, 1j, all that times x.So once we break it apart and expand it,

- 25:05
EC HEDBERG [continued]: we see that we have an intercept gamma 00, a slope for W,gamma 0, 1, a slope for X, gamma 1, 0.And now we have a slope for the interaction between W and X,which is gamma 1, 1.And again, we have a random effect and within groupresidual.Here are the effects, you know, we'll sort of go through.

- 25:27
EC HEDBERG [continued]: The intercept is, again, equal to about 33 points.We have a benefit of private school.We have a benefit of parental educationin the public schools.However, looking at the interaction term,private school times parental education,it's negative, which tells me that while there'sa benefit to having higher parental education

- 25:48
EC HEDBERG [continued]: in the public schools, that benefit goes downa little bit in the private schools.It's still positive.It's still a benefit, but it's notas much of a benefit in the private schoolsas it is in the public schools.Now again, because it's interaction,we could think about it a different way.We could say that there's a benefit to students beingin a private school, where that benefit goes down as parents

- 26:09
EC HEDBERG [continued]: become more and more educated.[Model With Level 1 Covariate With Random Slope,Level 2 Covariate and Interaction Between Level 1and Level 2 Covariates]So finally, we're going to talk about our most complicatedmodel today, which again has a level onecovariate, except this time it has a random slope.It has a level two covariate, and again the interaction

- 26:30
EC HEDBERG [continued]: between the level one and the level two covariates.Here's the HLM model.This as most complicated as we'regoing to get today, where we have the level one model,and then it breaks out into the different level two models.Again, both the intercept and the slope for the level onecovariate are predicted with level two variable Wj.

- 26:50
EC HEDBERG [continued]: And of course, we have random effects,not only we have the within group residual,we have a random effect for the intercept.And then we also a random effect for the slope.You can break all this out into the mixed notation,and again it becomes quite a lot to manage.But again, as we replace the betas,we see this sort of longer model thatincludes, not only the random effect for the intercept,

- 27:11
EC HEDBERG [continued]: but the heteroskedastic error term.Here are the results of this most complicated model.It's pretty complicated with only two variables.We have an intercept, which is again about 32, 33 points.We have a benefit to private school.We have the benefit of parental education.We have that negative effect for the interaction.And then finally, we have the variances of the random effects

- 27:34
EC HEDBERG [continued]: and the covariances between the interceptand the benefit of parental education, which is positive.So this model certainly binds everythingwe've talked about up until this point.[Conclusion]So we've talked about a lot today,but what we've talked about is how to write downa multilevel model, not only in the HLM notation, but also

- 27:57
EC HEDBERG [continued]: the mixed notation.Using only two variables, parental education,and whether or not the school is a private or public school,we've been able to do a smorgasbord of allthese different models with various random effectsand interactions and all the rest of it.So for further reading and for furtherlearning on this topic, because we'vecovered a lot in a very short amount of time,

- 28:17
EC HEDBERG [continued]: I would pick up Raudenbush & Bryk's Hierarchical LinearModels book.This book goes through with the HLM notation,but also reviews the mixed notation.McCulloch & Searle's Generalized, Linear, and MixedModels is also a really good resourceto understand sort of the mathematical mechanicsbehind how these models are estimated.And finally a classic is Goldstein's Multilevel Models

- 28:38
EC HEDBERG [continued]: in Education and Social Research.

### Video Info

**Publisher:** SAGE Publications Ltd

**Publication Year:** 2017

**Video Type:**Tutorial

**Methods:** Multilevel analysis, Random effects model, Heteroscedasticity

**Keywords:** estimates; estimation; mathematical concepts; practices, strategies, and tools

### Segment Info

**Segment Num.:** 1

**Persons Discussed:**

**Events Discussed:**

**Keywords:**

## Abstract

Professor E.C. Hedberg discusses multilevel modeling and what is estimated in a multilevel model. Multilevel modeling uses both random and fixed effects to estimate a mean or a difference between means. Hedberg provides an example of multilevel modeling in an analysis of the early childhood longitudinal study kindergarten cohort of 1998.