Skip to main content icon/video/no-internet

Panel data refer to data sets consisting of multiple observations on each sampling unit. This could be generated by pooling time-series observations across a variety of cross-sectional units, including countries, states, regions, firms, or randomly sampled individuals or households. This encompasses longitudinal data analysis in which the primary focus is on individual histories. Two well-known examples of U.S. panel data are the Panel Study of Income Dynamics (PSID), collected by the Institute for Social Research at the University of Michigan, and the National Longitudinal Surveys of Labor Market Experience (NLS) from the Center for Human Resource Research at Ohio State University. An inventory of national studies using panel data is given at http://www.ceps.lu/Cher/Cherpres.htm. These include the Belgian Household Panels, the German Socioeconomic Panel, the French Household Panel, the British Household Panel Survey, the Dutch Socioeconomic Panel, the Luxembourg Household Panel, and, more recently, the European Community household panel. The PSID began in 1968 with 4,802 families and includes an oversampling of poor households. Annual interviews were conducted and socioeconomic characteristics of each family and roughly 31,000 individuals who had been in these or derivative families were recorded. The list of variables collected is more than 5,000. The NLS followed five distinct segments of the labor force. The original samples include 5,020 men ages 45 to 59 years in 1966, 5,225 men ages 14 to 24 years in 1966, 5,083 women ages 30 to 44 years in 1967, 5,159 women ages 14 to 24 years in 1968, and 12,686 youths ages 14 to 21 years in 1979. There was an oversampling of Blacks, Hispanics, poor Whites, and military in the youths survey. The variables collected run into the thousands. Panel data sets have also been constructed from the U.S. Current Population Survey (CPS), which is a monthly national household survey conducted by the Census Bureau. The CPS generates the unemployment rate and other labor force statistics. Compared with the NLS and PSID data sets, the CPS contains fewer variables, spans a shorter period, and does not follow movers. However, it covers a much larger sample and is representative of all demographic groups.

Some of the benefits and limitations of using panel data are given in Hsiao (1986). Obvious benefits include a much larger data set because panel data are multiple observations on the same individual. This means that there will be more variability and less collinearity among the variables than is typical of cross-section or time-series data. For example, in a demand equation for a given good (say, gasoline) price and income may be highly correlated for annual timeseries observations for a given country or state. By stacking or pooling these observations across different countries or states, the variation in the data is increased and collinearity is reduced. With additional, more informative data, one can get more reliable estimates and test more sophisticated behavioral models with less restrictive assumptions. Another advantage of panel data is their ability to control for individual heterogeneity. Not controlling for these unobserved individual specific effects leads to bias in the resulting estimates. For example, in an earnings equation, the wage of an individual is regressed on various individual attributes, such as education, experience, gender, race, and so on. But the error term may still include unobserved individual characteristics, such as ability, which is correlated with some of the regressors, such as education. Cross-sectional studies attempt to control for this unobserved ability by collecting hard-to-get data on twins. However, using individual panel data, one can, for example, difference the data over time and wipe out the unobserved individual invariant ability. Panel data sets are also better able to identify and estimate effects that are not detectable in pure crosssection or pure time-series data. In particular, panel data sets are better able to study complex issues of dynamic behavior. For example, with cross-section data, one can estimate the rate of unemployment at a particular point in time. Repeated cross-sections can show how this proportion changes over time. Only panel data sets can estimate what proportion of those who are unemployed in one period remains unemployed in another period.

...

locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading