- 00:12
Hello.I'm professor Robert Bruhl.I teach political science at the University of Illinoisat Chicago and my specialties are statistics and researchmethodology.In this tutorial, I'll be addressingthe ways in which we use statistical inference to assesspossible relationships between traits of phenomenaas they are found to occur in observations we've made.

- 00:35
To begin our discussion, we will start with a common questionin research.Are two traits related?In some cases the traits of interestare assessed as qualities, man or woman, Republican, Democrat,Independent, or other, smoker or non-smoker, or preferencefor cola A or cola B. To test such propositions usingbehavioral observations, we employthe methods of statistical inference in two ways, one,to establish a relationship between two traits exists,and secondly, to determine if the relationship canbe generalized beyond the given observations.

- 01:13
In this tutorial, I'll be addressingthese two aspects of statistical inferenceand how they work together to provide a means of assessingpossible relationships between qualitative traitsas they are observed to occur in phenomenonin which we're interested.

- 01:35
First, we'll talk about establishing a relationshipbetween two traits.In a previous tutorial, we discussedwhat it means when we say two traits are related.More precisely, we defined what itmeans when we say that two traits are not related.This was said to be stochastic independence.As it turns out, we can use this principle in a positive wayto construct an assessment of the strength of a relationshipbetween two qualitative traits.

- 02:03
This test is said to be the chi-square test,and we shall illustrate it with an example.Suppose we're market researchers and, for advertising purposes,we'd like to know if women and men aredifferent in their preferences for cola A and cola B.To test this proposition, we collecta set of 100 men and women and perform a taste test.

- 02:24
As a point of reference, our sample taste testersconsists of 35 men and 65 women.After the testing, we have the following results,40 individuals, or 40% of the sample, preferred colaA and 60 individuals, or 60% of the sample, preferred cola B.Now, if gender doesn't make a difference in one's colapreference, we would expect the men and women to haveidentical rates of preference.

- 02:51
That is we would expect to find in our sample40% of the men preferred cola A, 40%of the women preferred cola A, 60% percentof the men preferred cola B, and 60% of the womenprefer cola B. Given there were 35 men and 65 women,we would expect to find 40% of the 35 men, that's14, preferred cola A, and 40% of the 65 women, that's26 women, also preferred cola A. Similarly,we would also expect to find 60% of the 35 men, that's21, preferred cola B, and 60% of the 65 women, that's 35 women,also preferred cola B.

- 03:34
These expectations can be displayedin what's said be a contingency table or a cross tabulation.In a contingency table, we have a series of rows and columnsacross the columns.We have a column for those people who we would expectto prefer cola A and those people we would expectto prefer cola B, and along the rowswe have identified men and women.

- 03:59
And so for the first row, we would expect 14 mento prefer cola A and we'd expect 21 of those mento prefer cola B for a total of 35 men.For the women, we would expect 26 of them to prefer cola Aand we'd expect 39 of them to prefercola B for a total of 65 women.

- 04:21
Along the bottom, we have a total of 40prefer cola A and 60 prefer cola B,and that concludes our table.Now, what we actually observed in our sample,however, was as follows.30 of the men preferred cola A, 10 of the women preferred colaA, five of the men preferred cola B, and 55 of the womenpreferred cola B.

- 04:45
Now, to see how our actual observations comparedto our expectations, we can add the actual observationsto the table.In our same table, we now show that the actual numberof the men who preferred cola A was 30 and the actual numberof men who preferred cola B was five, again,for the total of 35.

- 05:07
For the women, the actual number of women who preferred cola Awas 10 and the actual number of women who preferred colaB was 55 for our total of 65.If we then compare our actual observationsto our expectations, we find that a greater numberof men than expected, that's 30 versus 14,preferred cola A to cola B and a greater number of womenthan expected, 55 versus 39, preferred cola B to cola A.

- 05:39
From this we could conclude that there seems to be evidencethat taste preferences for cola A and cola B do in factdiffer by gender.But are these results generalizable?As a point of fact, the results we have found onlyextend to the 100 individuals in our sample.

- 06:03
In order to generalize the results,we need to employ the methods of statistical inference.In the case of two qualitative traits,the test of significance is a chi-square test.As a point of reference, significance testingis based on the following.A set of specific observations is only a snapshotof a larger population.Snapshots may be perfect, near perfect,or poor representations of a population,and to test a generalizability of a relationship foundin a sample, we use double negative logic.

- 06:34
We propose a hypothetical population,fitting the criteria of no relationship between the twotraits of interest.We then use the central limit theoremto determine if our sample is a poor representationof that hypothetical population.This is reported as the probabilitythat the sample is a perfect snapshotof the hypothetical population.

- 06:55
If this probability is low, that's less than 5%,we conclude our sample is a poor representationof the hypothetical population where there's no relationship.And from this we can be relatively confidentthat our sample instead came from a populationwhere the two traits are related.This is said to be rejecting the null hypothesis.

- 07:17
Now, back to our example.The chi-square test is derived from the central limit theorem.One, we construct a hypothetical population whichthe two traits are not related.This population is represented by the expectedvalues of the traits we displayedin our contingency table.We then represent the sample by the average difference squaredbetween the actual number of observations of each traitand its expected number of observations.

- 07:45
This is said to be the chi-square statistic.A sample representing a perfect snapshotof a hypothetical population will havea chi-square statistic of zero.A sample representing a near perfect snapshotof a hypothetical population willhave a chi-square statistic with a relatively small value,and a sample representing a poor snapshotof the hypothetical population willhave a chi-square statistic with a large value,indicating large differences between the actual and expectedobservations.

- 08:17
We then use the central limit theoremto derive what is said to be a chi-square probabilitydistribution to determine the probability of snapshotsbeing perfect, near perfect, or pooras represented by their chi-square statistic.From this, we assess the probabilitythat our sample is a poor representationof the hypothetical population, again, the populationwith no relationship.

- 08:42
This is reported inversely as the probabilitythat the sample is a perfect representationof that population.If that probability is low, that's less than 5%,we can be confident the sample is a poor representationof the no relationship population,and thus we can be confident the sample came from a populationwhere there is a relationship between the two traits,and the results can indeed be generalized.

- 09:07
Otherwise, we conclude that the sample simply representsthe population with no relationship betweenthe traits, and the results of our sample cannot begeneralized.In our example, the chi-square statisticwould be calculated as follows.We see that for the men we have an actual number of 30who preferred cola A versus the expected number of 14.

- 09:31
The difference between those is 16.16 squared is equal to 256.And 256 divided by 14, which is the expectednumber of observations, is equal to 18.29.For those who preferred cola B, the actual number was 5.The expected number was 21.

- 09:52
The difference between those two is negative 16.Negative 16 squared is equal to 256.And 256 divided by 21, which is the expectednumber of observations, is equal to 12.19.For the women, the actual number of women who preferred cola Awas 10.

- 10:12
The expected number was 26.The difference between those two was 16.The square of that difference is 256.And 256 divided by 26 is equal to 9.85.Finally, the number of women who actually preferred cola Bwas 55.The expected number was 39.The difference between those two is 16.

- 10:33
16 squared is, again, 256.And 256 divided by 39 is equal to 6.56.If we add these all together, that's 18.29 plus 12.19plus 9.85 plus 6.56, this is equal to 46.89.

- 10:55
And this is said to be the chi-square statisticor average differences for this particular set of observations.From here, we would consult the chi-square distributionwe constructed or we would refer to a statistical softwareprogram.By this consultation, we find that the probabilitythat our sample is a perfect representation of a populationin which there's no relationship between the traitsis less than 1/100 of a percent or 0.0001.

- 11:26
Consequently, we can conclude that the sample isa poor representation of such a populationand instead it represents a population in which the twotraits are related.Thus we're confident we can generalize our finding that menare more likely to prefer cola A while women aremore likely to prefer cola B.For further reference, the topic of chi-square analysisis covered in substantial detail in my forthcoming bookUnderstanding Statistical Analysis and Modeling by Sage.

### Video Info

**Publisher:** SAGE Publications Ltd.

**Publication Year:** 2017

**Video Type:**Tutorial

**Methods:** Chi-square test, Significance testing, Statistical inference

**Keywords:** imagery; practices, strategies, and tools; traits

### Segment Info

**Segment Num.:** 1

**Persons Discussed:**

**Events Discussed:**

**Keywords:**

## Abstract

Using the example of market research on soda preferences, Professor Robert Bruhl explains how to use statistical inference to evaluate possible relationships between traits. He lays out the steps in the chi-square test, which can measure the strength of a relationship between traits.