Processing Data
Little Green Book

Processing Data

Little Green Book
By: Linda B. Bourque & Virginia A. Clark Published: 1992
Methods: Missing data
+- LessMore information
  • Copyright
  • Series Editor's Introduction

    In this monograph, the notion of data processing is broadly defined to cover the essential steps of quantitative research that must be taken before data analysis can begin. Obviously, these steps determine data quality. Without good data processing, “garbage in, garbage out” is all too likely to hold true. Nevertheless, relatively little is written about proper methods of data preparation, partly because it is more of an “art” when compared with the “science” of hypothesis testing.

    Fortunately, Drs. Bourque and Clark, seasoned scientists themselves, have now made the tools of that art more widely accessible. While they have a general interest in the processing of all sorts of social science data—from interviews, observations, records, or documents—the research example upon which they draw most heavily comes from surveys. In particular, surveys of community response to recent California earthquakes, studies carried out under their direction, provide rich illustration.

    With respect to designing a questionnaire, the authors discuss key issues. For example, should you create your own instrument, must the items be closed- or open-ended, is a “don't know” category advisable? They provide useful tips as well, such as that responses should be assigned consistent numerical codes, and item categories should be exhaustive and mutually exclusive. In addition, they offer instruction on the complications of such questionnaire and survey elements as “skip patterns” and “heaping.”

    In terms of data collection, Bourque and Clark spell out the procedures for testing a questionnaire and forming an interview team. On data entry, they give an informative sketch of the old method, batch processing, contrasting it to the current method of interacting with a personal computer. That is, data can be directly entered into the computer, perhaps on a spreadsheet, and analyzed as part of a statistical package, such as SPSS or SAS. To prepare the data for analysis, a data file, or a subfile, may be created. The authors also describe such important data preparation concerns as weighting, casewise versus pairwise deletion, missing values, and transformations. Finally, they offer a summary checklist for study documentation.

    Instructors of research methods have a great many texts to select from when they wish to assign something on data analysis. However, on the topic of data processing their choices are few. This comprehensive, up-to-date, readable monograph lengthens that short list. It should be an immense help to all those students who have not yet learned how actually to carry out a research project, but are eager to do so.

    Michael S.Lewis-Beck Series Editor
  • Acknowledgments

    We would like to thank A. A. Afifi, Beverly Cosand, Philip Costic, Ralph Dunlap, Eve Fielder, Virginia Flack, Carolyn Geda, Linda Lange, Corrie Peek, Susan Sorenson, Elizabeth Stephenson, Terri Walsh, Mel Widawski, and two readers from Sage Publications for their helpful comments and assistance on earlier drafts; Welden Clark for invaluable assistance with Chapter 4; Gloria Krauss for clerical assistance; and Margie Norman, Gloria Krauss, and Ralph Dunlap for editing assistance. Data used in examples were collected and processed with funds from the National Science Foundation (No. 62617 and BCS-9002754), the Natural Hazards Research and Application Center (Purchase Order 494933C1), the Earthquake Engineering Research Institute (EERI M880411), the National Center for Earthquake Engineering Research (Purchase Order R34779), and the Southern California Injury Prevention Research Center under funds from the Centers for Disease Control (No. R49/CCR903622).

  • References
    ADAMS, R. N., and PREISS, J. J.(1960)Human Organization Research: Field Relations and Techniques. Homewood, IL: Dorsey.
    ADAY, L. A.(1989)Designing and Conducting Health Surveys: A Comprehensive Guide. San Francisco: Jossey-Bass.
    AFIFI, A. A., and CLARK, V.(1990)Computer-Aided Multivariate Analysis (
    2nd ed.
    ). New York: Van Nostrand Reinhold.
    ALWIN, D. F. (ed.) (1991, August)Research on Survey Quality. Special issue of Sociological Methods and Research, Volume 20.
    American Journal of Public HealthInformation for authorsVol. 81:(1991)134–138.
    American Sociological Association(1988)Code of ethics. December 2.
    ANDERSON, A. B., BASILEVSKY, A., and HUM, D. P. J.(1983)Missing data: A review of the literature, pp. 415–494 in P. H.Rossi, J. D.Wright, and A. B.Anderson (eds.) Handbook of Survey Research. New York: Academic Press.
    ANDERSON, B. A., SILVER, B. D., and ABRAMSON, P. R.“The effects of race of the interviewer on measures of electoral participation by blacks in SRC national election studies.”Public Opinion Quarterly52:(1988a)53–83.ANDERSON, B. A., SILVER, B. D., and ABRAMSON, P. R.“The effects of the race of the interviewer on race-related attitudes of black respondents in SRC/CPS national election studies.”Public Opinion Quarterly52:(1988b)289–324.ANDREWS, F.“Construct validity and error components of survey measures: A structural modeling approach.”Public Opinion Quarterly48(2):(1984)409–442.
    BABBIE, E. R.(1973)Survey Research Methods. Belmont, CA: Wadsworth.
    BAILEY, K. D.(1987)Methods of Social Research (
    3rd ed.
    ). New York: Free Press.
    BARNETT, V., and LEWIS, T.(1984)Outliers in Statistical Data (
    2nd ed.
    ). New York: John Wiley.
    BOLLEN, K. A.(1989)Structural Equations With Latent Variables. New York: John Wiley.
    BOONE, M. S., and WOOD, J. T.(1992)Computer Applications for Anthropologists. Belmont, CA: Wadsworth.
    BRADBURN, N. M.(1983)Response effects, pp. 289–328 in P. H.Rossi, J. D.Wright, and A. B.Anderson (eds.) Handbook of Survey Research. New York: Academic Press.
    BRADBURN, N. M., SUDMAN, S. Associates (1979)Improving Interview Method and Questionnaire Design. San Francisco: Jossey-Bass.
    BREWER, J., and HUNTER, A.(1989)Multimethod Research: A Synthesis of Styles. Newbury Park, CA: Sage.
    CAMPBELL, B.“Race of interviewer effects among southern adolescents.”Public Opinion Quarterly45:(1981)231–244.
    CARMINES, E. G., and ZELLER, R. A.(1979)Reliability and Validity Assessment. Beverly Hills, CA: Sage.
    CHATTERJEE, S., and HADI, A. S.(1988)Sensitivity Analysis in Linear Regression. New York: John Wiley.
    CHUN, K. T., COBB, S., and FRENCH, J. R. P., Jr.(1975)Measures for Psychological Assessment: A Guide to 3,000 Original Sources and Their Applications. Ann Arbor: University of Michigan, Institute for Social Research, Survey Research Center.
    CLARK, V. A., ANESHENSEL, C, FRERICHS, R., and MORGAN, T.“Analysis of effects of sex and age in response to items on the CES-D Scale.”Psychiatry Research5:(1981)171–181.
    Commerce Clearing House, Inc.(1991)Federal Acquisition Regulation (FAR), Subchapter A—General, Part 1, Federal Acquisition Regulations System. Chicago, IL: Commerce Clearing House.
    CONVERSE, J. M., and PRESSER, S.(1986)Survey Questions: Handcrafting the Standardized Questionnaire. Sage University Paper series on Quantitative Applications in the Social Sciences, 07–063. Beverly Hills, CA: Sage.
    CONVERSE, P. E.(1970). Attitudes and non-attitudes: Continuation of a dialogue, pp. 168–189 in E. R.Tulte (ed.) The Quantitative Analysis of Social Problems. Menlo Park, CA: Addison-Wesley.
    COOK, R. D., and WEISBERG, S.(1982)Residuals and Influence in Regression. New York: Chapman & Hall.
    COTTER, P. R., COHEN, J., and COULTER, P. B.“Race-of-interviewer effects in telephone interviews.”Public Opinion Quarterly46:(1982)278–284.DAVID, M., LITTLE, R. J. A., SAMUHEL, M. E., and TRIEST, R. K.“Alternative methods for CPS income imputation.”Journal of the American Statistical Association81(393):(1986)29–41.
    DEROGATIS, L. R., and SPENCER, P. M.(1982)The Brief Symptom Inventory (BSI). Riderwood, MD: Clinical Psychometric Research.
    DEVELLIS, R. F.(1991)Scale Development: Theory and Application. Applied Social Research Methods, Volume 26.Newbury Park, CA: Sage.
    DILLMAN, D. A.(1978)Mail and Telephone Surveys: The Total Design Method. New York: John Wiley.
    DUNCAN, O. D., and STENBECK, M.“No opinion or not sure?”Public Opinion Quarterly52:(1988)513–525.
    DUNN, O. J., and CLARK, V. A.(1987)Applied Statistics: Analysis of Variance and Regression (
    2nd ed.
    ). New York: John Wiley.
    EDWARDS, A. L.(1957)Techniques of Attitude Scale Construction. New York: Appleton-Century-Crofts.
    FAULKENBERRY, G. D., and MASON, R.“Characteristics of nonopinion and no opinion response groups.”Public Opinion Quarterly42:(1978)533–543.FIECK, L. F.“Latent class analysis of survey questions that include ‘Don't Know’ responses.”Public Opinion Quarterly53:(1989)525–547.
    FINK, A., and KOSECOFF, J.(1985)How to Conduct Surveys: A Step-by-Step Guide. Beverly Hills, CA: Sage.
    FLEISS, J. L.(1981)Statistical Methods for Rates and Proportions (
    2nd ed.
    ). New York: John Wiley.
    FLEISS, J. L.(1986)The Design and Analysis of Clinical Experiments. New York: John Wiley.
    FORD, B. L.(1983)An overview of hot-deck procedures in incomplete data, pp. 185–207 in W. G.Madow, I.Olken, and D. B.Rubin (eds.) Sample Surveys, Vol. 2: Theory and Bibliographies. New York: Academic Press.
    FREY, J. H.(1989)Survey Research by Telephone (
    2nd ed.
    ). Newbury Park, CA: Sage.
    GEDA, C. L.(1991)The Inter-University Consortium for Political and Social Research. American Economic Association Newsletter (March): 16–18.
    GEORGE, L. K., and BEARON, L. B.(1980)Quality of Life in Older Persons. New York: Human Sciences Press.
    GRISSO, T., BALDWIN, E., BLANCK, P. D., ROTHERAM-BORUS, M. J., SCHOOLER, N. R., and THOMPSON, T.“Standards in research: APA's mechanism for monitoring the challenges.”American Psychologist46:(1991)758–766.
    HALD, A.(1952)Statistical Theory With Engineering Applications. New York: John Wiley.
    HALL, E. T.(1966)The Hidden Dimension. Garden City, NY: Doubleday.
    HINES, W. G., and HINES, R. J. O.“Quick graphical power: Hyphen transformation selection.”American Statistician41:(1987)21–24.
    HOAGLIN, D. C., MOSTELLER, F., and TUKEY, J. W. (eds.) (1983)Understanding Robust and Exploratory Data Analysis. New York: John Wiley.
    HOSMER, D. W., and LEMESHOW, S.(1989)Applied Logistic Regression. New York: John Wiley.
    HYMAN, H. H.(1972)Secondary Analysis of Sample Surveys: Principles, Procedures, and Potentialities. New York: John Wiley.
    JOBE, J. B., and LOFTUS, E. F. (eds.) (1991)Cognition and Survey Measurement. Special issue of Applied Cognitive Psychology, Volume 5.
    KALTON, G., and KASPRZYK, D.“The treatment of missing survey data.”Survey Methodology12(1):(1986)1–16.
    KANE, R. A., and KANE, R. L.(1981)Assessing the Elderly: A Practical Guide to Measurement. Lexington, MA: Lexington.
    KEANE, T. M., CADDELL, J. M., and TAYLOR, K. L.“Mississippi Scale for combat-related posttraumatic stress disorder: Three studies in reliability and validity.”Journal of Consulting and Clinical Psychology56(1):(1988)85–90.
    KIECOLT, K. J., and NATHAN, L. E.(1985)Secondary Analysis of Survey Data. Sage University Paper series on Quantitative Applications in the Social Sciences, 07–053. Beverly Hills, CA: Sage.
    KIM, J. O., and MUELLER, C. W.(1978)Introduction to Factor Analysis. Sage University Paper series on Quantitative Applications in the Social Sciences, 07–013. Beverly Hills, CA: Sage.
    KISH, L.(1965)Survey Sampling. New York: John Wiley.
    LITTLE, R. J. A., and RUBIN, D. B.(1987)Statistical Analysis With Missing Data. New York: John Wiley.
    LITTLE, R. J. A., and RUBIN, D. B.(1990)The analysis of social science data with missing values, pp. 374–409 in J.Fox and J. S.Long (eds.) Modern Methods of Data Analysis. Newbury Park, CA: Sage.
    LONG, J. S.(1983)Confirmatory Factor Analysis. Sage University Paper series on Quantitative Applications in the Social Sciences, 07–033. Beverly Hills, CA: Sage.
    McDOWELL, I., and NEWELL, C.(1987)Measuring Health: A Guide to Rating Scales and Questionnaires. New York: Oxford University Press.
    McIVER, J. P., and CARMINES, E. G.(1981)Unidimensional Scaling. Sage University Paper series on Quantitative Applications in the Social Sciences, 07–024. Beverly Hills, CA: Sage.
    McKENNELL, A. C.(1977)Attitude scale construction, pp. 183–220 in C. A.O'Muircheataugh and C.Payne (eds.) Exploring Data Structures, Vol. 1: The Analysis of Survey Data. New York: John Wiley.
    MILES, M. B., and HUBERMAN, A. M.(1984)Qualitative Data Analysis: A Sourcebook of New Methods. Beverly Hills, CA: Sage.
    National Science Foundation(1989)Notice 106 (April 17). Washington, DC: Government Printing Office.
    OSGOOD, C. E., SUCI, G. J., and TANNENBAUM, P. H.(1957)The Measurement of Meaning. Urbana: University of Illinois Press.
    PATTON, M. Q.(1990)Qualitative Evaluation and Research Methods. London: Sage.
    POE, G. S., SEEMAN, I., McLAUGHLIN, J., MEHL, E., and DIETZ, M.“ ‘Don't Know’ boxes in factual questions in a mail questionnaire: Effects on level and quality of response.”Public Opinion Quarterly52:(1988)212–222.
    POOR, A.(1990)The Data Exchange. Homewood, IL: Dow Jones-Irwin.
    PRESSER, H., and SCHUMAN, H.(1989)The measurement of the middle position in attitude surveys, pp. 108–123 in E.Singer and S.Presser (eds.) Survey Research Methods: A Reader. Chicago: University of Chicago Press.
    REEDER, L. G., RAMACHER, L., and GORELNIK, S.(1976)Handbook of Scales and Indices of Health Behavior. Pacific Palisades, CA: Goodyear.
    REESE, S. D., DANIELSON, W. A., SHOEMAKER, P. J., CHANG, T. K., and HSU, H. L.“Ethnicity-of-interviewer effects among Mexican-Americans and Anglos.”Public Opinion Quarterly50:(1986)563–572.
    REYNOLDS, H. T.(1977)Analysis of Nominal Data. Sage University Paper series on Quantitative Applications in the Social Sciences, 07–007. Beverly Hills, CA: Sage.
    ROBINSON, J. P., RUSK, J. G., and HEAD, K. B.(1973)Measures of Political Attitudes. Ann Arbor: University of Michigan, Institute for Social Research.
    ROBINSON, J. P., and SHAVER, P. R.(1973)Measures of Social Psychological Attitudes. Ann Arbor: University of Michigan, Institute for Social Research, Survey Research Center.
    ROBINSON, J. P., SHAVER, P. R., and WRIGHTSMAN, L. S.(1991)Measures of Personality and Social Psychological Attitudes. New York: Academic Press.
    ROUSSEEUW, P. J., and VAN ZOMEREN, B. C.“Unmasking multivariate outliers and leverage points.”Journal of the American Statistical Association85:(1990)633–639.
    SCHUMAN, H., and CONVERSE, J. M.(1989)The effects of black and white interviewers on black responses, pp. 247–271 in E.Singer and S.Presser (eds.) Survey Research Methods: A Reader. Chicago: University of Chicago Press.
    SCRIMSHAW, S. C. M., and HURTADO, E.(1987)Rapid Assessment Procedures for Nutrition and Primary Health Care. Los Angeles: University of California, Latin American Center Publications.
    SHAW, M. E., and WRIGHT, J. M.(1967)Scales for the Measurement of Attitudes. New York: McGraw-Hill.
    SHEATSLEY, P. B.(1983)Questionnaire construction and item writing, pp. 195–230 in P. H.Rossi, J. D.Wright, and A. B.Anderson (eds.) Handbook of Survey Research. New York: Academic Press.
    SIEBER, J. E. (ed.) (1991)Sharing Social Science Data. Newbury Park, CA: Sage.
    SINGER, E., FRANKEL, M. R., and GLASSMAN, M. B.(1989)The effect of interviewer characteristics and expectations on response, pp. 272–287 in E.Singer and S.Presser (eds.) Survey Research Methods: A Reader. Chicago: University of Chicago Press.
    SONQUIST, J. A., and DUNKELBERG, W. C.(1977)Survey and Opinion Research: Procedures for Processing and Analysis. Englewood Cliffs, NJ: Prentice-Hall.
    SPRADLEY, J. P.(1980)Participant Observation. New York: Holt, Rinehart & Winston.
    STEPHENSON, E. (n.d.) Retention and Archiving of Survey Material. Los Angeles: University of California, Institute for Social Science Research.
    STEWART, D. W., and KAMINS, M. A.(1993)Secondary Research: Information Sources and Methods. Applied Social Research Methods, Volume 4.Thousand Oaks, CA: Sage.
    SUDMAN, S., and BRADBURN, N. M.(1982)Asking Questions. San Francisco: Jossey-Bass.
    Survey Research Center(1976)Interviewer's Manual (rev. ed.). Ann Arbor: University of Michigan, Institute for Social Research, Survey Research Center.
    TORGERSON, W. S.(1958)Theory and Methods of Scaling. New York: John Wiley.
    TUKEY, J. W.(1977)Exploratory Data Analysis. Reading, MA: Addison-Wesley.
    TURNER, R., NIGG, J. M., and HELLER PAZ, D.(1986)Waiting for Disaster: Earthquake Watch in Southern California. Berkeley: University of California Press.
    U.S. Bureau of the Census(1970)1970 Census, Industry and Occupation Coding Training Manual. Washington, DC: Government Printing Office.
    U.S. Bureau of the Census(1992)1990 Census of Population: Alphabetical Index of Industries and Occupations. Washington, DC: Government Printing Office.
    U.S. Department of Health and Human Services. (1990)Public Health Service, PHS Grants Policy Statement. DHHS Publication (OASH) 90–50,000 (rev.). October 1. Washington, DC: Government Printing Office.
    U.S. Office of Management and Budget(1990)Data Editing in Federal Statistical Agencies. Prepared by Subcommittee on Data Editing in Federal Statistical Agencies, Federal Committee on Statistical Methodology, Statistical Policy Office, Office of Information and Regulatory Affairs, Office of Management and Budget. Washington, DC: Government Printing Office.
    VAN DUSEN, R. A., and ZILL, N. (eds.) (1975)Basic Background Items for U.S. Household Surveys. Washington, DC: Social Science Research Council, Center for Coordination of Research on Social Indicators.
    WEBB, E. J., CAMPBELL, D. T., SCHWARTZ, R. D., and SUCHREST, L.(1966)Unobtrusive Measures: Nonreactive Research in the Social Sciences. Chicago: Rand McNally.
    WEBER, R. P.(1985)Basic Content Analysis. Sage University Paper series on Quantitative Applications in the Social Sciences, 07–049. Beverly Hills, CA: Sage.
    WEEKS, M. F., and MOORE, R. P.“Ethnicity-of-interviewer effects on ethnic respondents.”Public Opinion Quarterly45:(1981)245–249.
    WEINBERG, E.(1983)Data collection: Planning and management, pp. 329–358 in P. H.Rossi, J. D.Wright, and A. B.Anderson (eds.) Handbook of Survey Research. New York: Academic Press.
  • Relevant Software Manuals

    BMDP Data Entry (1991).

    BMDP Statistical Software Manual, Vol. 1, for 1990 Software Release (see Chapter 2, “Data”).


    SAS IBM 370 Formats and Informats.

    SAS/FSP Guide, Version 6 (1987) (data entry).

    SAS Procedures Guide, Release 6.03 (1988).

    SAS Language Guide for Personal Computers, Version 6 (1987).

    SPSS Data Entry II for the IBM PC/XT/AT and PS/2 (1987).

    SPSS/PC+ V2.0 Base Manual (1988), by Marija J. Norusis. Chicago: SPSS Inc.

    SPSS/PC+ Update for V3.0 and V3.1 (1989).

  • About the Authors

    LINDA B. BOURQUE is Professor and Head of the Division of Population and Family Health, and Vice Chair of the Department of Community Health Sciences, in the School of Public Health at the University of California at Los Angeles, where she teaches courses in research design and survey methodology. Her research is in the area of intentional and unintentional injury. She is the author or coauthor of 40 scientific articles and the book Defining Rape. She received her Ph.D. in sociology from Duke University.

    VIRGINIA A. CLARK is Professor Emeritus of Biostatistics in the School of Public Health and Biomathematics in the School of Medicine at the University of California at Los Angeles. She is an expert in multivariate analysis and has consulted in biomedical and economic studies. She is author of more than 80 scientific articles and coauthor of four textbooks: Preparation for Basic Statistics (with Michael E. Tarter), Applied Statistics: Analysis of Variance and Regression (2nd ed.) (with O. Jean Dunn), Survival Distributions: Reliability Applications in the Biomedical Sciences (with Alan Gross), and Computer-Aided Multivariate Analysis (2nd ed.) (with A. A. Afifi). She received her Ph.D. in biostatistics from the University of California at Los Angeles.