• ## Summary

Search form
• 00:03

JEFFREY ROSENTHAL: In the lectures so far,we've been considering various kinds of data.And all of the variables have beenwhat we call quantitative variables, variablesrepresented by numbers that can be larger, and smaller,and so on.Now we're going to discuss other kinds of variables, which arecalled categorical variables.

• 00:23

JEFFREY ROSENTHAL [continued]: And they're variables that just aren't represented by numbers.And we'll see how we can think about them.For context, let's remember that life expectancy datathat we looked at before.This gives the number of years that people live on averagein 197 different countries and territories around the world.

• 00:44

JEFFREY ROSENTHAL [continued]: Because these data were represented by numbers,it meant that we could compute things.We could talk about, for example, the minimum valueand the maximum value.We could talk about the median.We learned about the quartiles.We used that to draw a box plot.We also computed things like the mean value

• 01:06

JEFFREY ROSENTHAL [continued]: and the standard deviation.We could plot things like histograms.We could talk about if they're skewed or if they're symmetric.We could talk about the extreme values of the data.These are all things that we can talk aboutif we have quantitative data.But there's other kinds of data too.So let's go back to that data about life expectancies

• 01:29

JEFFREY ROSENTHAL [continued]: and think about where it comes from.Each of those countries and territoriesis represented somewhere in this map.Now the data that we got also divides up the worldin terms of different regions.And the way they divided up the data,they talked about the Americas, North, South, and Central,

• 01:49

JEFFREY ROSENTHAL [continued]: as one region.They talked about Europe and Central Asia as another region.They talked about East Asia and the Pacific as another region.They talked about South Asia as a fourth region,and the Middle East and Northern Africa as a fifth region,and finally, sub-Saharan Africa as the sixth region.

• 02:13

JEFFREY ROSENTHAL [continued]: In that way, they divided up the map into six regions.Now that gives us some data too.If we want, we can make a list.Here's a list of those 197 different countriesand territories, and for each one, which of the six regionsit belongs to.Well, once again, it's just a jumble of text.How can we make sense of it?

• 02:35

JEFFREY ROSENTHAL [continued]: Let's start by considering what wedid for the quantitative variables-- thingslike the minimum, and maximum, and median, and quartiles,and box plots, and mean, and standard deviation, and so on.Can we use these for a categorical variablelike the region?No, we can't.The reality is, hardly any of these thingsmake any sense anymore for a categorical variable.

• 02:58

JEFFREY ROSENTHAL [continued]: So what can we do?Well, one simple thing we can do is just count.We can say, how many of those 197 different countriesand territories are in each of those six regions?Well, here's the count that we get.We see that there's 29 in the Americas.There's 50 and Europe and Central Asia.There is 30 in East Asia and the Pacific.

• 03:20

JEFFREY ROSENTHAL [continued]: There is eight in South Asia.There's 21 in the Middle East and North Africa.And there's 49 in sub-Saharan Africa.So now we have a table of numbers.And that's OK, but there's other ways we can represent it too.One nice way is what's called a bar chart.And a bar chart, as illustrated here,

• 03:41

JEFFREY ROSENTHAL [continued]: just presents a separate bar for eachof the different categories-- in this case, each of the sixregions.And for each one, the height corresponds to how manyelements are in that region.So in that way, we've represented the countsas a nice looking little chart.Now if we prefer, instead of representing the counts,

• 04:01

JEFFREY ROSENTHAL [continued]: we can represent the relative frequencies,which is just what fraction of the countries and territoriesare in each of those six different categories.That's illustrated here.It looks exactly the same except that we'vedivided by the total number, so now the heights of the barsactually add up to 1.Another way that we can illustrate

• 04:22

JEFFREY ROSENTHAL [continued]: that same data is with a pie chart, as illustrated here.A pie chart considers a round a circle or a pie and chopsup slices of it in proportion to how manyof the data fit into each of the different categories--in this case how many of those 197different countries and territories fit into each

• 04:43

JEFFREY ROSENTHAL [continued]: of the six different regions.Now pie charts say pretty much the same thing as a bar chart.We won't use them as often, but some people like them.In fact, some people have fun with them.Here's a cute little pie chart showingthe amount of pie somebody has eatenor not, taking pie a little too literally.And here's another one that I think

• 05:04

JEFFREY ROSENTHAL [continued]: is quite cute where they illustrate sales of Girl Scoutcookies by actually making a pie out of piecesof the different cookies--very clever.Anyway, the important thing is to understandthat when you have a categorical variable,you can illustrate the counts by usingeither a bar chart, or the relative frequencies,

• 05:27

JEFFREY ROSENTHAL [continued]: or a pie chart.So that's pretty much all we can say for nowabout categorical variables.We'll get into them more in next week's lectureswhen we discuss the relationshipsbetween different variables.But for now, let's consider one more example.Let's go back to that skeleton datawe had before from the anthropologists.

• 05:49

JEFFREY ROSENTHAL [continued]: Remember that this had various informationabout 400 different skeletons that had been studied.Well, we already talked about things like the estimated ageat the time of death or the differencebetween the estimated and the true age at the time of death.And those were quantitative variables.So we could talk about the minimum, and the maximum,

• 06:10

JEFFREY ROSENTHAL [continued]: and the median, and box plots, and quartiles, and mean,and standard deviation, and all of those things.But what about some other variables?For example, each of those skeletons has a sex.Is it male or female?Well, for that data, it turns out that 281 of them are male,

• 06:30

JEFFREY ROSENTHAL [continued]: and 119 of them are female.So that's a categorical variable.You can't take the maximum or the minimum or the mean.But we can draw a bar chart.And here's a bar chart showing the sex of those 400 skeletons.If we want, we can also show the relative frequencies.

• 06:51

JEFFREY ROSENTHAL [continued]: Or if we prefer, we can make it into a pie chart.Another example of a categorical variable for these skeletonswould be the mass category.That is, each of these skeletons that peopleat the time of death were categorizedas being of normal weight, or underweight, or overweight,

• 07:13

JEFFREY ROSENTHAL [continued]: or obese.Now in this case, 225 of the 400 skeletonswere classified as normal.74 were reclassified as underweight.81 were classified as overweight.And 20 were classified as obese.Again, we could just make a table,but we could also display that as a bar chart,

• 07:36

JEFFREY ROSENTHAL [continued]: or if we prefer, the relative frequencies.Or if we prefer, we could make a pie chart.In this way, we can illustrate the categorical variables,not in as much detail as the quantitative variables,but we can still say something about them.As we move forward and discuss the relationshipsbetween different variables, it will

• 07:56

JEFFREY ROSENTHAL [continued]: become more and more important to understandthe categorical variables as opposedto quantitative variables and how they all fit together.

### Video Info

Series Name: Understanding Data

Publisher: Alison Gibbs and Jeffrey Rosenthal

Publication Year: 2013

Video Type:Tutorial

Methods: Categorical variables

Keywords: practices, strategies, and tools

### Segment Info

Segment Num.: 1

Persons Discussed:

Events Discussed:

Keywords:

## Abstract

Jeffrey Rosenthal discusses the difference between categorical and quantitative variables.