Skip to main content icon/video/no-internet

Analysis with survey research data often first requires recoding of variables. Recode is a term used to describe the process of making changes to the values of a variable. Rarely can researchers proceed directly to data analysis after receiving or compiling a "raw" data set. The values of the variables to be used in the analysis usually have to be changed first. The reasons for recoding a variable are many. There may be errors in the original coding of the data that must be corrected. A frequency distribution run or descriptive statistics on a variable can help the researcher identify possible errors that must be corrected with recodes. For example, if a variable measuring respondent's party identification has values of 1 for Democrat, 3 for Independent, 5 for Republican, and 7 for Other, and a frequency distribution shows values of either 2, 4, or 6, then these incorrect values can be recoded to their correct value by going back to the original survey instrument if it is available. More likely these incorrect values will have to be recoded to a "missing value" status. Computer-assisted data collection makes it impossible to check the correct value, as there is no paper trail, but makes it less likely that incorrect values will be entered in the first place.

Data are usually collected to get as much information as possible, but these data often must be recoded to yield more interpretable results. For example, respondents may be asked to report their date of birth in a survey. Recoding these values to age (in years) lets the researcher interpret the variable in a more intuitive and useful way. Furthermore, if the researcher wants to present the age variable in a frequency distribution table, then the interval measure of age can be recoded into an ordinal measure. For example, respondents between the ages of 18 and 24 can have their age value recoded to category 1, respondents between 25 and 34 can be recoded to category 2, respondents between 35 and 44 can be recoded to category 3, and so on. As interval- and ratio-level data can always be recoded into nominal- or ordinal-level data but nominal- and ordinal-level data cannot be recoded into interval-level data, it is always better to collect data at the interval or ratio level if possible.

Recoding is also necessary if some values of a variable are recorded inconveniently for analysis. For example, surveys typically code responses of "Don't know" and "Refused" as 8 and 9. The researcher needs to recode these values so they are recognized as missing values by the computer program being used for statistical analyses. Leaving these values unchanged will yield inaccurate and misleading results. Another reason to recode a variable is to transform the values of a variable (e.g. a log transformation) to tease out the true nature of the relationship between two variables. Also, if missing values are assigned a number in the original coding, the researcher needs to recode these values to missing before analyzing the data. As a final precaution, it is advisable to always run a frequency or descriptive statistics on the recoded variable to make certain that the desired recodes were achieved.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading