Entry
Reader's guide
Entries A-Z
Sparse Table
A sparse table is a cross-classification of observations by two or more discrete variables that has many cells with small or zero frequencies. Sparse contingency tables occur most often when the total number of observations is small relative to the number of cells. For example, consider a table with N = 4 cells and n = 12 observations. If the observations are spread evenly over the four cells, then the maximum possible frequency is small (i.e., n/N = 3). Furthermore, if the occurrence of observations in one of the four cells is a rare event, then a large sample would be needed to obtain observations in this cell. As a second example, consider a cross-classification of four variables, each with seven categories, which has N = 74 2,401 cells. A sample size considerably larger than 2,401 is required to ensure that all cells contain nonzero frequencies, and a much larger sample size is needed to ensure that all frequencies are large enough for statistical tests and modeling.
| Table 1 Example of a Two-Way Table With Sampling Zeros | ||||||
|---|---|---|---|---|---|---|
| Genetic Testing: More Harm or Good? | ||||||
| More Good | More Harm | It Depends | Total | |||
| How Much | Great Deal | 54 | 10 | 0 | 64 | |
| Know About | Not Very Much | 170 | 88 | 0 | 258 | |
| Genetic Tests? | Nothing at All | 17 | 14 | 0 | 31 | |
| Total | 241 | 112 | 0 | 353 | ||
Sparseness invalidates standard statistic hypothesis tests, such as chi-square tests of independence or model goodness-of-fit. The justification for comparing test statistics (e.g., Pearson's chi-square statistic or the likelihood ratio statistic) to a chi-square distribution depends critically on having “large” samples, where large means having expected values for cells that are greater than or equal to 5. Without large samples, the probability distribution with which test statistics should be compared is unknown. Possible solutions to this problem include using an alternative statistic, performing exact tests, or approximating the probability distribution of test statistics by resampling or Monte Carlo methods.
Sparse tables often contain zero frequencies, which can cause estimation problems, including biased descriptive statistics (e.g., odds ratio), the estimation of log-linear model parameters, and difficulties for computational algorithms that fit models to data. Whether an estimation problem exists depends on the pattern of zero frequencies in the data and the particular model being estimated. Parameters cannot be estimated when there are zeros in the corresponding margins. For example, Table 1 consists of the cross-classification of responses to the following two questions and possible responses from the 1996 General Social Survey (http://www.icpsr.umich.edu:8080/GSS/homepage.htm): (a) “Based on what you know, do you think genetic testing will do more good than harm?” with possible responses of “more good than harm,” “more harm than good,” and “it depends”; and (b) “How much would you say that you have heard or read about genetic screening?” with possible responses of “a great deal,” “not very much,” and “nothing at all.” None of the respondents answered “It depends” to the first question. For the independence log-linear model, a parameter for the marginal effect for “It depends” cannot be estimated. The information needed to estimate this parameter is the column marginal value, which has no observations.
...
- Analysis of Variance
- Association and Correlation
- Association
- Association Model
- Asymmetric Measures
- Biserial Correlation
- Canonical Correlation Analysis
- Correlation
- Correspondence Analysis
- Intraclass Correlation
- Multiple Correlation
- Part Correlation
- Partial Correlation
- Pearson's Correlation Coefficient
- Semipartial Correlation
- Simple Correlation (Regression)
- Spearman Correlation Coefficient
- Strength of Association
- Symmetric Measures
- Basic Qualitative Research
- Basic Statistics
- F Ratio
- N(n)
- t-Test
- X¯
- Y Variable
- z-Test
- Alternative Hypothesis
- Average
- Bar Graph
- Bell-Shaped Curve
- Bimodal
- Case
- Causal Modeling
- Cell
- Covariance
- Cumulative Frequency Polygon
- Data
- Dependent Variable
- Dispersion
- Exploratory Data Analysis
- Frequency Distribution
- Histogram
- Hypothesis
- Independent Variable
- Measures of Central Tendency
- Median
- Null Hypothesis
- Pie Chart
- Regression
- Standard Deviation
- Statistic
- Causal Modeling
- DISCOURSE/CONVERSATION ANALYSIS
- Econometrics
- Epistemology
- Ethnography
- Evaluation
- Event History Analysis
- Experimental Design
- Factor Analysis and Related Techniques
- Feminist Methodology
- Generalized Linear Models
- HISTORICAL/COMPARATIVE
- Interviewing in Qualitative Research
- Latent Variable Model
- LIFE HISTORY/BIOGRAPHY
- LOG-LINEAR MODELS (CATEGORICAL DEPENDENT VARIABLES)
- Longitudinal Analysis
- Mathematics and Formal Models
- Measurement Level
- Measurement Testing and Classification
- Multilevel Analysis
- Multiple Regression
- Qualitative Data Analysis
- Sampling in Qualitative Research
- Sampling in Surveys
- Scaling
- Significance Testing
- Simple Regression
- Survey Design
- Time Series
- ARIMA
- Box-Jenkins Modeling
- Cointegration
- Detrending
- Durbin-Watson Statistic
- Error Correction Models
- Forecasting
- Granger Causality
- Interrupted Time-Series Design
- Intervention Analysis
- Lag Structure
- Moving Average
- Periodicity
- Serial Correlation
- Spectral Analysis
- Time-Series Cross-Section (TSCS) Models
- Time-Series Data (Analysis/Design)
- Trend Analysis
Get a 30 day FREE TRIAL
-
Watch videos from a variety of sources bringing classroom topics to life
-
Read modern, diverse business cases
-
Explore hundreds of books and reference titles
Sage Recommends
We found other relevant content for you on other Sage platforms.
Have you created a personal profile? Login or create a profile so that you can save clips, playlists and searches