Entry
Reader's guide
Entries A-Z
Subject index
Face Validity
Face validity is a test of internal validity. As the name implies, it asks a very simple question: “On the face of things, do the investigators reach the correct conclusions?” It requires investigators to step outside of their current research context and assess their observations from a commonsense perspective. A typical application of face validity occurs when researchers obtain assessments from current or future individuals who will be directly affected by programs premised on their research findings. An example of testing for face validity is the assessment of a proposed new patient tracking system by obtaining observations from local community health care providers who will be responsible for implementing the program and getting feedback on how they think the new program may work in their centers.
What follows is a brief discussion on how face validity fits within the overall context of validity tests. Afterward, documentation of face validity's history is reviewed. Here, early criticisms of face validity are addressed that set the stage for how and why the test returned as a valued assessment. This discussion of face validity concludes with some recent applications of the test.
The Validity of Face Validity
To better understand the value and application of face validity, it is necessary to first set the stage for what validity is. Validity is commonly defined as a question: “To what extent do the research conclusions provide the correct answer?” In testing the validity of research conclusions, one looks at the relationship of the purpose and context of the research project to the research conclusions. Validity is determined by testing (questions of validity) research observations against what is already known in the world, giving the phenomenon that researchers are analyzing the chance to prove them wrong. All tests of validity are context-specific and are not an absolute assessment. Tests of validity are divided into two broad realms: external validity and internal validity. Questions of external validity look at the generalizability of research conclusions. In this case, observations generated in a research project are assessed on their relevance to other, similar situations. Face validity falls within the realm of internal validity assessments. A test of internal validity asks if the researcher draws the correct conclusion based on the available data. These types of assessments look into the nuts-and-bolts of an investigation (for example, looking for sampling error or researcher bias) to see if the research project was legitimate.
History of Face Validity
For all of its simplicity, the test for face validity has had an amazing and dramatic past that, until recently, has re-emerged as a valued and respected test of validity. In its early applications, face validity was used by researchers as a first-step assessment, in concert with other tests, to assess the validity of an analysis. During the 1940s and 1950s, face validity was used by psychologists when they were in the early stages of developing tests for use in selecting industrial and military personnel. It was soon widely used by many different types of researchers in different types of investigations, resulting in confusion on what actually constituted face validity. Quickly, the confusion over the relevance of face validity gave way to its being rejected by researchers in the 1960s, who took to new and more complex tests of validity.
Table 1 Debate Over Face Validity

Early Debate Surrounding Face Validity
Discussions surrounding face validity were revived in 1985 by Baruch Nevo's seminal article “Face Validity Revisited,” which focused on clearing up some of the confusion surrounding the test and challenging researchers to take another, more serious look at face validity's applications. Building on Nevo's research, three questions can be distinguished in the research validity literature that have temporarily prevented face validity from getting established as a legitimate test of validity (see Table 1).
The first question regarding face validity is over the legitimacy of the test itself. Detractors argue that face validity is insignificant because its observations are not based on any verifiable testing procedure yielding only rudimentary observations about a study. Face validity does not require a systematic method in the obtaining of face validity observations. They conclude that the only use for face validity observations is for public relations statements.
Advocates for face validity see that face validity provides researchers with the opportunity for commonsense testing of research results: “After the investigation is completed and all the tests of validity and reliability are done, does this study make sense?” Here, tests of face validity allow investigators a new way to look at their conclusions to make sure they see the forest for the trees, with the forest being common sense and the trees being all of the different tests of validity used in documenting the veracity of their study.
The second question confuses the value of face validity by blurring the applications of face validity with content validity. The logic here is that both tests of validity are concerned with content and the representativeness of the study. Content validity is the extent to which the items identified in the study reflect the domain of the concept being measured. Because content validity and face validity both look at the degree to which the intended range of meanings in the concepts of the study appear to be covered, once a study has content validity, it will automatically have face validity. After testing for content validity, there is no real need to test for face validity.
The other side to this observation is that content validity should not be confused with face validity because they are completely different tests. The two tests of validity are looking at different parts of the research project. Content validity is concerned with the relevance of the identified research variables within a proposed research project, whereas face validity is concerned with the relevance of the overall completed study. Face validity looks at the overall commonsense assessment of a study. In addition to the differences between the two tests of validity in terms of what they assess, other researchers have identified a sequential distinction between content validity and face validity. Content validity is a test that should be conducted before the data-gathering stage of the research project is started, whereas face validity should be applied after the investigation is carried out. The sequential application of the two tests is intuitively logical because content validity focuses on the appropriateness of the identified research items before the investigation has started, whereas face validity is concerned with the overall relevance of the research findings after the study has been completed.
The third question surrounding face validity asks a procedural question: Who is qualified to provide face validity observations—experts or laypersons? Proponents for the “experts-only” approach to face validity believe that experts who have a substantive knowledge about a research topic and a good technical understanding of tests of validity provide constructive insights from outside of the research project. In this application of face validity, experts provide observations that can help in the development and/or fine-tuning of research projects. Laypersons lack technical research skills and can provide only impressionistic face validity observations, which are of little use to investigators.
Most researchers now see that the use of experts in face validity assessments is more accurately understood as being a test of content validity because they provide their observations at the start or middle of a research project, and face validity focuses on assessing the relevance of research conclusions. Again, content validity should be understood sequentially in relation to face validity, with the former being used to garner expert observations on the relevance of research variables in the earlier parts of the investigation from other experts in the field, and face validity should come from laypersons for their commonsense assessment at the completion of the research project.
The large-scale vista that defines face validity, defines the contribution this assessment provides to the research community, also provides its Achilles heel. Face validity lacks the depth, precision, and rigor of inquiry that comes with both internal and external validity tests. For example, in assessing the external validity of a survey research project, one can precisely look at the study's sample size to determine if it has a representative sample of the population. The only question face validity has for a survey research project is a simple one: “Does the study make sense?” For this reason, face validity can never be a stand-alone test of validity.
The Re-Emergence of Face Validity
The renewed interest in face validity is part of the growing research practice of integrating laypersons’ nontechnical, one-of-a-kind insights into the evaluation of applied research projects. Commonly known as obtaining an emic viewpoint, testing for face validity provides the investigator the opportunity to learn what many different people affected by a proposed program already know about a particular topic. The goal in this application of face validity is to include the experiential perspectives of people affected by research projects in their assessment of what causes events to happen, what the effects of the study in the community may be, and what specific words or events mean in the community.
The following examples show how researchers use face validity assessments in very different contexts, but share the same goal: obtaining a commonsense assessment from persons affected by research conclusions. Michael Quinn Patton is widely recognized for his use of “internal evaluators” to generate face validity observations in the evaluation of programs. In the Hazelden Foundation of Minnesota case study, he describes his work in providing annual evaluations based on the foundation's data of tracking clients who go through its program. At the completion of the annual evaluation, a team of foundation insider evaluators then participates in the evaluation by assessing the data and conclusions made in the reports.
Face validity assessments are commonly used in applied research projects that include the fields of community development, planning, public policy, and macro social work. In planning, face validity observations are obtained during scheduled public hearings throughout the planning process. The majority of planning research is based on artificial constructs of reality that allow planners to understand complex, multivariable problems (e.g., rush-hour traffic). One of the reasons that planners incorporate citizen input into the planning process is that it allows them to discover the “inside perspective” from the community on how their research and proposed plans may affect their day-to-day lives. A street-widening project in Lincoln, Nebraska, is one example of how a city used face validity in its planning process. A central traffic corridor was starting to experience higher levels of rush-hour congestion as the result of recent growth on the city's edge. Knowing that simply widening the street to accommodate more vehicles could affect area businesses adversely, city planners met with local store owners to get their face validity observations of how the street affected their daily operations. Armed with traffic data and face validity observations of local store owners, the city was able to plan a wider street that took into account both traffic commuters’ and area businesses’ experiences with the street.
Further Readings
- Descriptive Statistics
- Distributions
- Graphical Displays of Data
- Hypothesis Testing
- p Value
- Alternative Hypotheses
- Beta
- Critical Value
- Decision Rule
- Hypothesis
- Nondirectional Hypotheses
- Nonsignificance
- Null Hypothesis
- One-Tailed Test
- Power
- Power Analysis
- Significance Level, Concept of
- Significance Level, Interpretation and Construction
- Significance, Statistical
- Two-Tailed Test
- Type I Error
- Type II Error
- Type III Error
- Important Publications
- “Coefficient Alpha and the Internal Structure of Tests”
- “Convergent and Discriminant Validation by the Multitrait–Multimethod Matrix”
- “Meta-Analysis of Psychotherapy Outcome Studies”
- “On the Theory of Scales of Measurement”
- “Probable Error of a Mean, The”
- “Psychometric Experiments”
- “Sequential Tests of Statistical Hypotheses”
- “Technique for the Measurement of Attitudes, A”
- “Validity”
- Aptitudes and Instructional Methods
- Doctrine of Chances, The
- Logic of Scientific Discovery, The
- Nonparametric Statistics for the Behavioral Sciences
- Probabilistic Models for Some Intelligence and Attainment Tests
- Statistical Power Analysis for the Behavioral Sciences
- Teoria Statistica Delle Classi e Calcolo Delle Probabilità
- Inferential Statistics
- Q-Statistic
- R2
- Association, Measures of
- Coefficient of Concordance
- Coefficient of Variation
- Coefficients of Correlation, Alienation, and Determination
- Confidence Intervals
- Margin of Error
- Nonparametric Statistics
- Odds Ratio
- Parameters
- Parametric Statistics
- Partial Correlation
- Pearson Product-Moment Correlation Coefficient
- Polychoric Correlation Coefficient
- Randomization Tests
- Regression Coefficient
- Semipartial Correlation Coefficient
- Spearman Rank Order Correlation
- Standard Error of Estimate
- Standard Error of the Mean
- Student's t Test
- Unbiased Estimator
- Weights
- Item Response Theory
- Mathematical Concepts
- Measurement Concepts
- Organizations
- Publishing
- Qualitative Research
- Reliability of Scores
- Research Design Concepts
- Aptitude-Treatment Interaction
- Cause and Effect
- Concomitant Variable
- Confounding
- Control Group
- Interaction
- Internet-Based Research Method
- Intervention
- Matching
- Natural Experiments
- Network Analysis
- Placebo
- Replication
- Research
- Research Design Principles
- Treatment(s)
- Triangulation
- Unit of Analysis
- Yoked Control Procedure
- Research Designs
- A Priori Monte Carlo Simulation
- Action Research
- Adaptive Designs in Clinical Trials
- Applied Research
- Behavior Analysis Design
- Block Design
- Case-Only Design
- Causal-Comparative Design
- Cohort Design
- Completely Randomized Design
- Crossover Design
- Cross-Sectional Design
- Double-Blind Procedure
- Ex Post Facto Study
- Experimental Design
- Factorial Design
- Field Study
- Group-Sequential Designs in Clinical Trials
- Laboratory Experiments
- Latin Square Design
- Longitudinal Design
- Meta-Analysis
- Mixed Methods Design
- Mixed Model Design
- Monte Carlo Simulation
- Nested Factor Design
- Nonexperimental Design
- Observational Research
- Panel Design
- Partially Randomized Preference Trial Design
- Pilot Study
- Pragmatic Study
- Pre-Experimental Designs
- Pretest–Posttest Design
- Prospective Study
- Quantitative Research
- Quasi-Experimental Design
- Randomized Block Design
- Repeated Measures Design
- Response Surface Design
- Retrospective Study
- Sequential Design
- Single-Blind Study
- Single-Subject Design
- Split-Plot Factorial Design
- Thought Experiments
- Time Studies
- Time-Lag Study
- Time-Series Study
- Triple-Blind Study
- True Experimental Design
- Wennberg Design
- Within-Subjects Design
- Zelen's Randomized Consent Design
- Research Ethics
- Research Process
- Clinical Significance
- Clinical Trial
- Cross-Validation
- Data Cleaning
- Delphi Technique
- Evidence-Based Decision Making
- Exploratory Data Analysis
- Follow-Up
- Inference: Deductive and Inductive
- Last Observation Carried Forward
- Planning Research
- Primary Data Source
- Protocol
- Q Methodology
- Research Hypothesis
- Research Question
- Scientific Method
- Secondary Data Source
- Standardization
- Statistical Control
- Type III Error
- Wave
- Research Validity Issues
- Bias
- Critical Thinking
- Ecological Validity
- Experimenter Expectancy Effect
- External Validity
- File Drawer Problem
- Hawthorne Effect
- Heisenberg Effect
- Internal Validity
- John Henry Effect
- Mortality
- Multiple Treatment Interference
- Multivalued Treatment Effects
- Nonclassical Experimenter Effects
- Order Effects
- Placebo Effect
- Pretest Sensitization
- Random Assignment
- Reactive Arrangements
- Regression to the Mean
- Selection
- Sequence Effects
- Threats to Validity
- Validity of Research Conclusions
- Volunteer Bias
- White Noise
- Sampling
- Cluster Sampling
- Convenience Sampling
- Demographics
- Error
- Exclusion Criteria
- Experience Sampling Method
- Nonprobability Sampling
- Population
- Probability Sampling
- Proportional Sampling
- Quota Sampling
- Random Sampling
- Random Selection
- Sample
- Sample Size
- Sample Size Planning
- Sampling
- Sampling and Retention of Underrepresented Groups
- Sampling Error
- Stratified Sampling
- Systematic Sampling
- Scaling
- Software Applications
- Statistical Assumptions
- Statistical Concepts
- Autocorrelation
- Biased Estimator
- Cohen's Kappa
- Collinearity
- Correlation
- Criterion Problem
- Critical Difference
- Data Mining
- Data Snooping
- Degrees of Freedom
- Directional Hypothesis
- Disturbance Terms
- Error Rates
- Expected Value
- Fixed-Effects Model
- Inclusion Criteria
- Influence Statistics
- Influential Data Points
- Intraclass Correlation
- Latent Variable
- Likelihood Ratio Statistic
- Loglinear Models
- Main Effects
- Markov Chains
- Method Variance
- Mixed- and Random-Effects Models
- Models
- Multilevel Modeling
- Odds
- Omega Squared
- Orthogonal Comparisons
- Outlier
- Overfitting
- Pooled Variance
- Precision
- Quality Effects Model
- Random-Effects Models
- Regression Artifacts
- Regression Discontinuity
- Residuals
- Restriction of Range
- Robust
- Root Mean Square Error
- Rosenthal Effect
- Serial Correlation
- Shrinkage
- Simple Main Effects
- Simpson's Paradox
- Sums of Squares
- Statistical Procedures
- Accuracy in Parameter Estimation
- Analysis of Covariance (ANCOVA)
- Analysis of Variance (ANOVA)
- Barycentric Discriminant Analysis
- Bivariate Regression
- Bonferroni Procedure
- Bootstrapping
- Canonical Correlation Analysis
- Categorical Data Analysis
- Confirmatory Factor Analysis
- Contrast Analysis
- Descriptive Discriminant Analysis
- Discriminant Analysis
- Dummy Coding
- Effect Coding
- Estimation
- Exploratory Factor Analysis
- Greenhouse–Geisser Correction
- Hierarchical Linear Modeling
- Holm's Sequential Bonferroni Procedure
- Jackknife
- Latent Growth Modeling
- Least Squares, Methods of
- Logistic Regression
- Mean Comparisons
- Missing Data, Imputation of
- Multiple Regression
- Multivariate Analysis of Variance (MANOVA)
- Pairwise Comparisons
- Path Analysis
- Post Hoc Analysis
- Post Hoc Comparisons
- Principal Components Analysis
- Propensity Score Analysis
- Sequential Analysis
- Stepwise Regression
- Structural Equation Modeling
- Survival Analysis
- Trend Analysis
- Yates's Correction
- Statistical Tests
- F Test
- t Test, Independent Samples
- t Test, One Sample
- t Test, Paired Samples
- z Test
- Bartlett's Test
- Behrens–Fisher t′ Statistic
- Chi-Square Test
- Duncan's Multiple Range Test
- Dunnett's Test
- Fisher's Least Significant Difference Test
- Friedman Test
- Honestly Significant Difference (HSD) Test
- Kolmogorov-Smirnov Test
- Kruskal–Wallis Test
- Mann–Whitney U Test
- Mauchly Test
- McNemar's Test
- Multiple Comparison Tests
- Newman–Keuls Test and Tukey Test
- Omnibus Tests
- Scheffé Test
- Sign Test
- Tukey's Honestly Significant Difference (HSD)
- Welch's t Test
- Wilcoxon Rank Sum Test
- Theories, Laws, and Principles
- Bayes's Theorem
- Central Limit Theorem
- Classical Test Theory
- Correspondence Principle
- Critical Theory
- Falsifiability
- Game Theory
- Gauss–Markov Theorem
- Generalizability Theory
- Grounded Theory
- Item Response Theory
- Occam's Razor
- Paradigm
- Positivism
- Probability, Laws of
- Theory
- Theory of Attitude Measurement
- Weber–Fechner Law
- Types of Variables
- Validity of Scores
- Loading...