Classification

April Galyardt

doi:10.4135/9781506326139

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Classification

By: April Galyardt
In:The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation
Chapter DOI:https://doi.org/10.4135/9781506326139.n113
Subject:Education

Request Permissions

Show page numbers Hide page numbers

Classification refers to a broad set of statistical methods that arise in many different applications. In a classification problem, we have a categorical response variable that we wish to investigate in relationship to one or more input variables. Classification methods can be applied to problems in a wide variety of settings; applications in education include analyzing patterns of responses to standardized exams, inferring which middle school students will benefit from a drug prevention program, and predicting which graduating high school seniors will choose to attend a particular university if they are offered admission.

Common classification methods include logistic regression, support vector machines, decision trees, random forests, neural networks, and k-nearest neighbors. This entry discusses a few general issues in classification that should be considered when choosing a method and the differences between classification and the related problem of clustering.

General Issues in Classification

Classification problems include both prediction and inference. In an inference problem, the goal is to describe the relationship between the response variable and the explanatory variables, whereas in a prediction problem, the goal is to predict the value of an unobserved response variable for a new data point based on observed predictor variables. For example, if we wish to examine the relationship between a person’s diet and whether the person later gets cancer, this is an inference problem because the question of which foods put a person at risk is of paramount importance. In contrast, if we wished to classify the content of an image based on features extracted from the digital representation of the image, this is a prediction problem because which features are useful for making the classification are not important.

Logistic regression and decision trees are examples of methods that are appropriate for inference because they provide easy to interpret information about the relationship between the response variable and the explanatory variables. Though, as with any statistical methodology, making causal claims based on the results from a classification analysis relies on proper experimental design. K-nearest neighbors, support vector machines, and random forests may provide accurate predictions, but can be more challenging to interpret, [Page 284]and are therefore more appropriate for prediction problems than inference problems.

Any problem with a categorical response variable may be deemed a classification problem, but methods differ based on how many levels the categorical response has. Logistic regression is most often used as a binomial method for a binary response variable; by contrast, multinomial logistic regression, k-nearest neighbors, and linear discriminant analysis can easily handle any number of classes.

Decision Boundaries

Decision boundaries separate the space of input variables into regions labeled according to classification. One of the key elements determining the complexity of a classification problem is the shape of these boundaries. Figure 1 shows two classification problems with two classes (Δ, +) and two predictor variables (X1, X2). The solid line shows the Bayes’s optimal decision boundary, whereas the dotted line is the decision boundary estimated with logistic regression. Figure 1A shows a case where the Bayes’s optimal decision boundary is linear, whereas in Figure 1B, the boundary is nonlinear. If the input variables describe a space best partitioned using a nonlinear decision boundary, it is important to choose a method that can estimate such a boundary, particularly for inference problems.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Classification

General Issues in Classification

Decision Boundaries

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends