Skip to main content icon/video/no-internet

Data Mining

Data mining is a series of methods that aim to discover knowledge from data by applying algorithms. The algorithms for data mining are very diverse, depending on their intended objectives and the computational demand of the problem. Data mining methods have been developed at the intersection of the academic areas of statistics and computer science. Data mining methods can also be classified broadly into supervised and unsupervised learning. In this entry, methods for supervised learning used for prediction are reviewed first, followed by methods for unsupervised learning.

Supervised learning consists of methods applicable to data in which there is an outcome that can be used to determine whether the learning process was successful. The outcome is also commonly referred to as a dependent variable or response variable. Supervised learning methods can be used for prediction and learning about relationships between predictors and the outcome. Examples of methods of supervised learning include generalized linear models, classification and regression trees, random forests, and neural networks (NNs). Methods of supervised learning have found several applications in educational research, such as identifying students at risk of failing to reach achievement milestones or identifying the effects of educational interventions. Some methods for supervised learning allow inference about the general form of the relationship between predictors and the outcome or provide measures of predictor importance, whereas other supervised learning methods provide predictions but do not allow any inference about the functional form of the relationship between predictors and the outcome.

Unsupervised learning consists of methods in which there is no outcome, and therefore, their goal is to summarize data by finding similarities or associations between individuals or variables. Unsupervised learning methods include methods for clustering, association rule mining, principal components analysis, and exploratory factor analysis (EFA). At the student level, unsupervised learning has been applied in educational research to identify groups of students who respond similarly to measurement scales, have comparable growth trajectories, or benefit equally from educational interventions. At the instrument level, unsupervised learning has been used to group survey and scale items with respect to relationships with constructs and/or content areas measured.

Prediction Methods

The goal of prediction methods is to forecast an outcome using a set of predictors also known as independent variables or features. Prediction problems are commonly classified into regression or classification problems, depending on whether the outcome is continuous or categorical, respectively. In data mining for prediction, it is customary to use a training data set, which is a subset of the available data, to build the predictor, then a different subset as a test data set to evaluate the predictor. The quality of prediction can be evaluated with the mean squared error and the error rate in regression and classification problems, respectively.

There is a large number of parametric and nonparametric methods for prediction. Parametric methods make assumptions about the form of functional relationship between the outcome and predictors and the distribution of residuals, whereas nonparametric methods do not make these assumptions. One common type of parametric model is the generalized linear model, which uses different link functions (e.g., identity, log, and logit) and distributional families (e.g., Gaussian, Poisson, and binomial) to establish a linear relationship between a set of predictors and a function of the outcome. Examples of generalized linear models are linear regression, Poisson regression, and logistic regression.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading