- 00:00
[MUSIC PLAYING][Comparing Groups With Modern Robust Methods]

- 00:11
RAND WILCOX: Hello, I'm Rand Wilcox,a professor at the University of Southern California.[Dr. Rand Wilcox, Professor] This talk is about modernrobust methods for comparing groups.Briefly, there have been vast improvementsrelevant to the basic techniques thatare routinely taught and used.The practical importance of these techniquesstems from three fundamental insights which

- 00:31
RAND WILCOX [continued]: I'll outline during the talk.Now the idea that modern methods have a practical advantagemight seem surprising to some peoplebecause a lot of introductory booksclaim that methods that are routinely taught and used,methods that were developed prior to the year 1960worked just fine.And in fact there are some theoretic results suggesting

- 00:53
RAND WILCOX [continued]: that indeed, they do work well.However, based on hundreds of papers published in the last 50years, a more accurate descriptionof conventional methods is they workwell when comparing groups that don't differ in any mannerwhatsoever.The bad news is under general conditions,classic methods are highly unsatisfactory.The practical consequences that important differences

- 01:15
RAND WILCOX [continued]: among groups are often missed.Many researchers now know that there are concernswith conventional methods.Some of these researchers deal with these concernsin an effective manner, but most do not.They use ineffective and even unsound strategies.One reason is that most students are nottrained in a manner that takes into accountmodern insights and advances.

- 01:36
RAND WILCOX [continued]: My hope is that this talk will help correct this problem.[Classic Methods Versus Modern Techniques]To begin, consider all the standard hypothesistesting methods, including rank-based techniques.To appreciate modern techniques, weneed to understand when and why classic methods can

- 01:57
RAND WILCOX [continued]: be unsatisfactory.[Do they perform well when underlying assumptions arefalse?]So a basic issue is do they perform wellwhen the underlying assumptions are false?That is, do they control the probability of a type 1 error?Do they provide accurate confidence intervals?

- 02:17
RAND WILCOX [continued]: Do they provide relatively high power?In other words, compared to other techniques that we coulduse, could alternative methods increase our probabilitydetecting true differences among the groups being compared?Do they describe data in an accurate and meaningful way,and in a revealing way?To some extent assumptions can beviolated when using classic methods,but under general conditions, standard methods

- 02:38
RAND WILCOX [continued]: are formed poorly.For example, power can be very lowcompared to modern techniques.[Do moderns robust methods ever make a practical difference?]Do modern methods ever make a practical difference?And the answer turns out to be yes, absolutely.This has been demonstrated in numerous papers.What about diagnostic tools aimed at salvaging methods?

- 03:02
RAND WILCOX [continued]: This is a natural way of trying to justifyusing classic techniques.But in general-- there are exceptions--but in general, using diagnostic tools is very ineffective.The problem is that these diagnostic tools can beinsensitive, cannot detect situations where modern methodswould make a practical difference.The safest way to determine whether modern methods make

- 03:23
RAND WILCOX [continued]: a difference is to actually try them.It's the only way this works.Let's talk about the three insightsthat have to do with the classic techniques.The first insight has to do with violatingthe normality assumption.It turns out that when we're dealingwith skewed distributions, there are much more serious concernsthan previously thought.This turns out to be a really difficult problem to deal with.

- 03:46
RAND WILCOX [continued]: The second insight has to do with violatingwhat's called the homoscedasticity assumption.That's the assumption that when comparing groupsall the groups have a common variance.Even when the groups differ in other ways,that they have different means, you'reassuming that all the groups have a common variance.And a third insight has to do with whatare called heavy-tailed distributions and outliers.

- 04:08
RAND WILCOX [continued]: Another concern about the mean is it can poorlyreflect the typical response.To spike this, the mean might be of interest,but certainly this is not always the case.[Using modern robust methods]The good news about modern robust methods--they effectively deal with known concerns.Generally they're designed to perform well

- 04:29
RAND WILCOX [continued]: when the standard assumptions are true,and to continue to perform well the standard assumptionsare false.More broadly, they offer the opportunityto get a deeper, more accurate, more nuancedunderstanding of data.So let's elaborate a little bit.First let's consider non-normality,and paying particular attention to skewed distributions.

- 04:50
RAND WILCOX [continued]: Conventional wisdom is that with a sample size of 25,normality can be assumed.What that means is if we sample some observations,compute the mean, and repeat a study thousands and thousandsof times, if we plot the sample means we'll get approximatelya normal distribution.Let's suppose that really is the case.We get a normal distribution.A major insight is even if the [INAUDIBLE] mean

- 05:13
RAND WILCOX [continued]: has a normal distribution, student's t can perform poorly.To illustrate this point, take a lookat the distribution on screen.This is a skewed light-tailed distribution,roughly meaning that outliers occurbut they're relatively rare.That's certainly the case with a sample size of 25.The [INAUDIBLE] mean has, to a close approximation,a normal distribution.

- 05:34
RAND WILCOX [continued]: But now look at the next figure on screen.In the left panel you see the actual distributionof t-bone sapping from the distributionin the first figure I showed you.The left panel is based on a sample size of 20,the right panel is based on a sample size of 100.As you can see, the distribution is actually skewed to the left.Under normality, it's supposed to be symmetric, about zero.

- 05:58
RAND WILCOX [continued]: The practical consequence is that this impacts powerand it results in inaccurate confidence intervals,and poor control with probably a type 1 error.In this particular case we need about 200 observationsto get accurate results based on the sample mean.If we move towards more heavy-tailed distribution,this roughly means that outliers occur more often.

- 06:20
RAND WILCOX [continued]: Need about 300 observations or more.Now a common strategy for dealingwith skewed distributions is to transform the data.For example, take logs.Usually-- not always, but usually the distributionremains skewed and this does not deal effectively with outliers.Next let's talk about heavy-tail distributions and outliers.

- 06:42
RAND WILCOX [continued]: The concern is that they destroy power basedon any method using means.In addition to low power, measuresof effect size based on the mean and variancecan miss important differences.To illustrate one aspect of this issue,look at the next figure on screen.Shown as a standard normal distributionand what's called a mixed normal,the standard normal has variance one.

- 07:03
RAND WILCOX [continued]: This particular mixed normal distributionhas a variance 10.9.This illustrates a general principle--small changes in the tails of a distributioncan drastically increase the variance.Translation-- poor power, misleading measuresof effect size.In terms of actual data what we're dealing with

- 07:24
RAND WILCOX [continued]: are situations where outliers occur.Outliers inflate the sample variance.The sample variance gets large, you have low power.A common strategy for dealing with outliersis to discard them and use conventional methodsbased on the remaining data.If the outlier is indeed invalid, OK,this is a reasonable thing to do.

- 07:45
RAND WILCOX [continued]: But if there are valid results just unusually large or small,this is a technical disaster.Using the correct method can drastically alter the results.The details regarding technical sound methodsare described in some of the books listedat the end of this talk.Next let's talk about the homoscedasticity assumption,the assumption that all the groups have a common variance.

- 08:07
RAND WILCOX [continued]: Homoscedasticity refers to situationswhere the variances differ.A concern about homoscedastic methodsis that if the assumption is violated, this impacts power,it impacts probably of a type 1 error,it impacts the probably coverage when you'recomputing confidence intervals.

- 08:28
RAND WILCOX [continued]: One strategy for dealing with heteroscedasticityis to test the homoscedasticity assumption.And if it fails to reject, OK, go aheadand use a homoscedastic method.Five published papers have looked at this strategy.All five papers came to the same conclusion--this strategy does not work.Why?It turns out that the tests for homoscedasticity

- 08:49
RAND WILCOX [continued]: don't have enough power to detect situationswhere the assumption should be discarded.So what's the best strategy right now?I would say just simply use a heteroscedastic method.This is easily done for all of the usual designsfor comparing groups, particularly if you'rethat software R. Everything can be done quite easily whenthere is heteroscedasticity.

- 09:12
RAND WILCOX [continued]: We can summarize the properties of the one sample t-testas follows.If you're dealing with a symmetric light taildistribution, outliers are rare.Student's t works reasonably well.If you've got a skewed distributionor if there are outliers they can be highly satisfactory.Two sample t-tests.It works well when comparing identical distributions.

- 09:33
RAND WILCOX [continued]: So the homoscedasticity is something that's true.And [INAUDIBLE] to have the same skewness.But as we move away from that situationwhere the distributions differ in some manner,eventually it breaks down and performs very poorly.When dealing with more than two groups, violating assumptionsis even more serious.Fortunately, though, there are robust methodsfor dealing all of the usual ANOVA designs.

- 09:55
RAND WILCOX [continued]: Another serious concern-- it's common to lookfor outliers using a mean and a standard deviation.This results in what's called masking.Masking just means that the very presence of outlierscauses them to be missed.We can quibble about the best method for detecting outliers,but there's no controversy about the worst possible approach.Any method based on the mean and the standard deviation.

- 10:17
RAND WILCOX [continued]: Better are the boxplot rule and the MAD median rule.The details are in the books listed at the end of the talk.[WMW-- Strategy for Dealing WIth Non Norm?]Let's talk a little about rank-based methods.And in particular, I'll focus on the Wilcoxon Mann-Whitney test.This particular technique is routinelytaught in an introductory course.

- 10:40
RAND WILCOX [continued]: It's designed to compare two independent groups.It's based on an estimate that the probably of a randomlysample observation from the first groupis less than a randomly sample observation from the second.But inferences about this probabilityassume identical distributions.And under general conditions it does not compare mediansas is commonly claimed.

- 11:00
RAND WILCOX [continued]: Many methods for improving on the Wilcoxon and Mann -Whitney tests have been derived, which can be applied withthe software R. In fact, all classic rank-based methods havebeen improved substantially.[Dealing With Outliers]Dealing with outliers.There are two strategies that might be used.The first is to simply trim a certain percentage

- 11:23
RAND WILCOX [continued]: of the observations.The best known example is the sample median.It trims all but one or two values.Despite this extreme amount of trimming,it can be very effective relative to other choicesyou might make.But the reality is in some situations it trims too much.In which case, a little less trimming might be more optimal.

- 11:44
RAND WILCOX [continued]: For example, 20% trimming might be much more effective.One advantage of 20% trimming is it works almostas well as the mean under normality,but it continues to work well when we have skeweddistributions or outliers.It should be stressed, however, that when you'redealing with skewed distributions,trim means, including the median,and not designed to make inferencesabout the population mean.

- 12:04
RAND WILCOX [continued]: This could be a good thing if youwant to have a measurable occasion thatreflects the typical response.But in some cases you might want to know somethingabout the mean despite the fact that itmay be very atypical, in which casea very large sample size might be needed.Trimming can increase power.When you first encounter this, this sounds hard to believe.

- 12:24
RAND WILCOX [continued]: How can you trim away observationsand get more power?A higher probability detecting true differencesamong the groups.In fact, a result derived by Laplaceindicates that the median can have more powerunder general conditions.Nevertheless, it's very counter-intuitive.Providing a complete detailed explanation of why this occurs

- 12:45
RAND WILCOX [continued]: is impossible here, but we can illustrate itin the following way.Let's suppose we sample some observationsfrom a normal distribution.We compute the mean, the 20% trim mean, the median,and repeat this 10,000 times.The next figure shown on the screenshows the resulting box plots.

- 13:07
RAND WILCOX [continued]: Now theory tells us that the sample mean has less variationthan the 20% trim mean in the median.But as you can see in the box plotsthe advantage of the sample mean isn't all that striking.Now suppose we do the same experiment, but this timewe sample from a mixed normal distribution.So it's just a slight deviation awayfrom the normal distribution.

- 13:30
RAND WILCOX [continued]: Now look at the box plots.You'll see that there's tremendous amount of variationamong the sample means, and much less variationusing a 20% trim mean or median.Translation-- you're much more likely to find true differencesamong groups if you use a 20% trim mean or a median.Methods based on means can have substantially higher power,

- 13:52
RAND WILCOX [continued]: and they actually provide more accurate confidence intervals.The second strategy for dealing with outliers--remove them or downweight outliers,and then using a good outlier detection technique,and then use some appropriate, theoretically sound methodfor making inferences about measures of location.

- 14:12
RAND WILCOX [continued]: This latter strategy might seem more natural,because if you just trim 20% or use a medium,maybe you're getting rid of too many observations,some things that are not outliers.It turns out that the choice between these two strategiesis not straightforward, and a detailed explanationis impossible here.But the books I describe at the end give you more information.

- 14:33
RAND WILCOX [continued]: [Outliers & Hypothesis Testing]Now let's talk a little bit about outliers and hypothesistesting.Cannot be stressed too strongly.Use a method that takes into account how outliersare treated and detected.You cannot simply discard extreme values and apply

- 14:54
RAND WILCOX [continued]: methods or means using remaining data.The derivation of classic methods is no longer valid.Technically sound methods have been described in the booksat the end of the talk.[Conclusion]So we can summarize the situation as follows--

- 15:14
RAND WILCOX [continued]: if groups do not differ, modern methods are not important.If they differ, modern methods might notmake a substantial difference, but under general conditions,they do.How do you apply modern techniques?The best suggestion is use the softwareR. Virtually all of the modern techniques thatare available these days can be applied with R in contrast

- 15:36
RAND WILCOX [continued]: to other software packages you might use.For further reading, you might try the bookby Field and colleagues, Discovering Statistics usingR. It contains some of the modern robust methods.I have three books.The first book is Introduction to Robust Estimationand Hypothesis Testing.It's designed for people who have reasonably good training.

- 15:56
RAND WILCOX [continued]: I also have a book called Modern Statistics for the Socialand Behavioral Sciences.It's designed for a graduate level introductory course,a two semester course.And finally, I have a book calledUnderstanding and Applying Basic Statistical Methods UsingR. It's aimed at an undergraduate course.It covers all the standard routinely taught techniques,

- 16:17
RAND WILCOX [continued]: and it also includes various modern methodsso that students can understand whenand why more modern techniques have practical advantages.[MUSIC PLAYING]

### Video Info

**Publisher:** SAGE Publications Ltd

**Publication Year:** 2017

**Video Type:**Tutorial

**Methods:** Analysis of variance, Statistical inference, Sampling distribution

**Keywords:** comparison levels; practices, strategies, and tools; Software

### Segment Info

**Segment Num.:** 1

**Persons Discussed:**

**Events Discussed:**

**Keywords:**

## Abstract

Professor Rand Wilcox explains how to compare groups and test hypotheses using modern robust methods. Modern methods help the researcher achieve more accuracy than classic methods do. Wilcox discusses the practical differences of modern methods, the Wilcoxon-Mann-Whitney test, and dealing with outliers.