Normalization

Michael S.Lewis-Beck; Alan Bryman; Tim Futing Liao

doi:10.4135/9781412950589

Entry
Reader's guide
Entries A-Z

Return to Entries

Normalization

Edited by:
Michael S. Lewis-Beck
,
Alan Bryman
&
Tim Futing Liao
In:The SAGE Encyclopedia of Social Science Research Methods
Chapter DOI:https://doi.org/10.4135/9781412950589.n638
Subject:Anthropology, Business and Management, Criminology and Criminal Justice, Communication and Media Studies, Counseling and Psychotherapy, Economics, Education, Geography, Health, History, Marketing, Nursing, Political Science and International Relations, Psychology, Social Policy and Public Policy, Social Work, Sociology

Request Permissions

Show page numbers Hide page numbers

In many popular statistical models, we assume that some component of a variable Y has a NORMAL DISTRIBUTION. For example, in the linear regression model Y = α + βX + ε, we typically assume that the error term ε is normal. Although minor departures from normality may be acceptable, distributions with heavier-than-normal tails can compromise statistical estimates. In such cases, it may be preferable to transform Y so that the pertinent component is closer to normality. Transforming a variable in this way is called normalization.

If the pertinent component of Y has one heavy tail (skewed), then we often apply a power transformation. True to their name, power transformations raise Y to some power p (i.e., they transform Y into Yp). Powers greater than 1 reduce negative skew: An example is the quadratic transformation Y 2(p = 2). Powers between 0 and 1 reduce positive skew: An example is the square-root transformation or √Y or √Y + 1/2 (p = 0.5), which is common when Y represents counts or frequencies. For a power of 0, the power transformation is defined to be log(Y), which reduces positive skew in much the same way as a very small power. Negative powers have the same effect as positive powers applied to the reciprocal 1/Y and are used when the reciprocal has a natural interpretation—as when Y is a rate (events per unit time), so that 1 Y is the time between events.

In sum, the family of power transformations can be written as follows:

Power transformations assume that Y is positive; if Y can be zero or negative, we commonly make Y positive by adding a constant. There are formal procedures for estimating the best constant to add, as well as the power p that yields the best approximation to normality (Box & Cox, 1964). However, the optimal power and additive constant are usually treated only as rough guidelines.

If the pertinent component of Y has two heavy tails (excess KURTOSIS), we may use a modulus transformation (John & Draper, 1980),

which is a modified power transformation applied to each tail separately. Non-negative powers p less than 1 reduce kurtosis, while powers greater than 1 increase kurtosis. Again, there are formal procedures for estimating the optimal power p (John & Draper, 1980). If Y is symmetric around 0, then the modulus transformation will change the kurtosis without introducing skew. If Y is not centered at 0, it may be advisable to add a constant before applying the modulus transformation.

Other normalizations are typically used if Y represents proportions between 0 and 1: the arcsine or angular transformation sin−1(√Y), the logit or logistic transformation , and the probit transformation Φ−1 (Y) where Φ−1 is the inverse of the cumulative standard normal density. The logit and probit are better normalizations than the arcsine, which is gradually disappearing from practice.

Even the best transformation may not provide an adequate approximation to normality. Moreover, a transformed variable may be hard to interpret, and conclusions drawn from it may not apply to the original, untransformed variable (Levin, Liukkonen, & Levine, 1996). Fortunately, modern researchers often have good alternatives to normalization. When working with non-normal data, we can use a GENERALIZED LINEAR MODEL that assumes a different type of distribution. Or we can make weaker assumptions by using statistics that are “distribution-free” or nonparametric.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Normalization

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends