Exploratory Factor Analysis as a Tool for Investigating Complex Relationships: When Numbers Are Preferred over Descriptions and Opinions

Abstract

Exploratory Factor analysis is a research tool that can be used to make sense of multiple variables which are thought to be related. This can be particularly useful when a qualitative methodology may be the more appropriate method for collecting data or measures, but quantitative analysis enables better reporting. This case first gives a brief overview of exploratory factor analysis and then follows with a case study employing it.

Learning Outcomes

After reading this case study, you should be able to

  • Define exploratory factor analysis
  • Identify situations where exploratory factor analysis is applicable
  • State the limitations of exploratory factor analysis
  • Give a clinical description of sensory integration therapy

‘I do what works’ is a not an uncommon justification for a therapist's clinical activities. The accompanying evidence is often something like ‘I see changes, and my clients and their families are satisfied with the results’. Although these justifications may be valid rationalizations, they should not be counted as equal to objective research findings. First and foremost, how do you explain ‘what works’ to someone else who could then observe how it works and explain why it works to someone else? Ideally, outcomes should be described and presented so that any professional in the clinical area will know how they were produced. This scenario leads to a question I am currently researching: How can we generate objective data that can be used to evaluate particular therapy practices? There are several reasons why investigation is needed: For one, practitioners are using therapeutic techniques that they believe according to their professional expertise are effective, but research evidence is mixed at best. Yet, the therapists believe the techniques are effective and what is even more interesting is that across the disciplines of medicine, occupational therapy, speech-language therapy, and physical therapy, the same techniques have found supporters. In other words, there is considerable professional consensus that those techniques actually work, but there are also detractors who deserve to be answered with sound research evidence.

Evidence-Based Practice

This consensus reflects one pillar of what is known in clinical fields as ‘evidence-based practice’ (EBP). In principle, EBP is guided by data from (1) research studies, (2) clinician experience and judgment, and (3) observable patient outcomes. Nan Bernstein Ratner is not the first to describe EBP as it applies to a field like speech-language pathology. She has noted that the goal with any treatment strategy is to have high-quality evidence in all three categories. However, she also noted that experienced clinicians and researchers have acknowledged the shortcomings of data in highly contrived experimental settings as applied to real-world scenarios. The larger body of research aiming to show EBP, of which this case forms one case example, should illustrate how real-world data are messy when compared against controlled experimental studies. This is largely because of the sheer number of variables that cannot be controlled in the real world, for example, rapport between clinician and client, variations within clients, practitioners, and even within therapeutic practices that may claim (and aim) to be based on the same cookie-cutter recipe. In clinical practice, subtle interactions between such factors can make big differences in outcomes. With these ideas in mind, the researcher needs all the more to strive for high-quality evidence, because that is what allows for effective communication to outside parties and hopefully results in real-world efficacy. Otherwise, without the appropriate research evidence, EBP may be more of a hope and a promise than it is a reality.

Sensory Integration Therapy

One of the therapy strategies currently being employed by occupational therapists (those who work with fine motor skills as they relate to everyday tasks such as tying one's shoes or buttoning a shirt) and speech-language pathologists (those who work to develop communication abilities) is a strategy known as sensory integration (SI). SI therapy became popular in the 1970s largely due to the work of Jean Ayres. The basic premise is that by meeting certain sensory ‘needs’, that is, tactile, auditory, visual stimulation, prior to beginning a teaching program, learners will be more ready to learn. The stimulation may involve brushing of the arms or legs, rapidly and repeatedly compressing major joints, cradling via a beanbag or specialized chair, presentation of white noise (static) or soothing sounds via headphones, etc. Such therapeutic techniques have seen a resurgence since the beginning of the present millennium, particularly in the treatment of severe developmental disabilities like autism spectrum disorders (ASD). Because many children diagnosed with ASD are non-verbal or minimally verbal and may also engage in characteristic self-stimulatory behaviors, therapists have naturally gravitated toward therapies that focus on the senses. However, SI therapy is not without critics. Discussions of whether or not SI-based therapies are effective tools for teaching have been occurring since Ayres's book was published in 1972. Several exemplary articles on both sides of the debate are listed in the further reading section of this case.

Critics have often pointed to a perceived lack of connection between tactilely stimulating a child and that child's learning how to tie shoes or use one-word phrases. Proponents of SI therapies have argued that the stimulation does not teach the desired behaviors directly, but rather prepares the child to learn higher skills by calming them down and meeting basic sensory needs. These practitioners see SI therapy as a synergistic element of therapy that cannot be examined in isolation and must be regarded as part of a system of interactions.

SI Therapy and EBP

One of the major issues faced by researchers investigating SI therapies is the lack of a one-to-one relationship between therapeutic techniques and outcomes. However, it should be noted that most real-world phenomena do not result in one-to-one relationships but involve multi-faceted interactions. This has been one of the appealing aspects of qualitative research studies that aim to keep the rigor of the scientific method, but do not hold to replicability standards of classic experimental research. However, EBP as a doctrine often looks to quantitative data as a higher level of evidence and therefore more desirable than mere descriptions of behavior and interpretations of behavior. Even when substantial documentation and justifications are produced, traditional qualitative methods are subject to criticisms regarding subjectivity and experimenter bias. For fields that value EBP, quantitative data is the gold standard of research.

Purpose of This Case

This case will serve as an illustration of how quantitative methods may be applied to qualitative types of information. The particular method of interest is what is known as exploratory factor analysis. R.L. Gorsuch wrote a widely cited text giving detailed methods for calculations, theory, etc. regarding the use of factor analysis. Briefly, factor analysis is a tool that allows researchers to look at multiple sources of data and look for patterns and trends that may provide a deeper understanding of the relationships between sources of variability in multiple measures taken from the same group of participants.

Exploratory Factoring is a statistical method grounded in one or another type of correlation, usually, the parametric Pearson product–moment correlation. All approaches to correlation, including the non-parametric varieties, aim to measure the relationship between two variables as measures applied to the same participant subjects. The Pearson assumes a linear relationship where high (and low) scores on a single pair of variables tend to agree with each other. For instance, a high score in vocabulary will tend to correspond to a high score in reading comprehension for the same person, and a low score in vocabulary will tend to be paired with a low score in reading comprehension. In such a case, we have a positive correlation between vocabulary and reading comprehension. A negative correlation is also possible. For instance, a high degree of mental impairment is apt to correlate with a low vocabulary or reading score, whereas a low degree of mental impairment should correspond to a higher score in both vocabulary and reading comprehension. Factor analysis enables the researcher to examine multi-dimensional relationships, that is, relationships that exist between 3, 4, 6, 10 variables or more variables.

Types of Data

Factor analysis works best with measures to which algebraic operations are applicable. The most notable of descriptions of types or classes of measures or scales comes from S.S. Stevens. He proposed four different types: nominal, ordinal, interval, and rational. These types are used in programs such as SPSS and SAS. A nominal variable is one where some entity, quality, or whatever is merely named or given a numerical value, but not necessarily quantified by amount. The name either applies or does not apply. This is often the type of data collected in qualitative research. For example, a researcher could document particular styles of greeting like ‘hi’ and ‘what's up?’ These are each unique events, but ‘hi’ is not more of a greeting than ‘what's up?’ or vice versa. Each is a single named event, a type of greeting. However, a nominal variable can be viewed as a binary, on or off, scale where the name applies yielding a 1 or does not quantified as a 0 for that variable. Next up is an ordinal scale where values can be ranked from 0 to 1, 2, and so on and where 0 is judged as less than 1, which is less than 2, and so forth. At a slightly higher level, Stevens defined interval scales as ones where the ordinal relation is such that the distance between 0 and 1 equals that between 1 and 2, and so forth.

A still higher scale is the ratio type where any standard algebraic operation can be applied without error. Ratio measures are the gold standard, but, in many cases, hardly any error results from performing algebraic operations on quantified nominal, ordinal, and interval scales. With a ratio scale, there must be a clear way to identify a real absence designated as a real zero point on the scale.

What is apt to be missed in qualitative research is that even nominal data may be quantified. For example, if a researcher is describing some quality of an interaction, the naming of the quality produces one observation of that quality. A second researcher, or the same researcher can then look for that quality again, which would result in two observations of that quality. For example, ‘hi’ is not more of a greeting than ‘what's up?’, but each greeting counts as one event, so that there are twice the number of greetings when both are counted. Note that this should not be equated with having obtained twice the quality, but rather two observations of that quality. In another example, seeing two sunrises does not double the amount of sunlight observed. Or in yet a different analogy, observing a child tie his shoes on Monday and then again on Tuesday does not result in twice the number of tied shoes, but it does result in two observations of shoe tying. With the previous examples in mind, if a researcher is willing to count observations of a quality as opposed to trying to quantify a quality, the tools of parametric statistics can be used. My colleagues and I have argued a more complete development of this idea elsewhere in one of our books Milestones: Normal Speech & Language Development across the Lifespan (also see John W. Oller Jr and Liang Chen in the Journal of Quantitative Linguistics).

Example Study

In order to demonstrate how exploratory factor analysis may be used, this case will describe a research study that I am working on at the time of this writing. This study involves looking at SI therapy as it is currently being used by clinicians in a rehabilitation setting. The clinicians and even the pediatricians all report positive outcomes from the SI therapy. However, traditional standardized/norm-referenced tests cannot be used effectively to assess subtle changes over short time periods. There are several global issues to be addressed: (1) does SI therapy correlate with a child meeting therapeutic goals, (2) are traditional assessment methods sensitive to small changes in the short-term when used in combination, (3) is there any indication that an integrative assessment or a combination of these types of assessments may be used to evaluate SI therapies, and finally (4) because there are multiple SI therapy techniques and procedures for implementing those techniques, do different dosing regimens or techniques produce different outcomes?

Each of these ideas is important in its own right. An experimental procedure could be set up to test each idea individually. In some respects, the experimental method is the preferred method. However, the SI techniques are currently being used and manipulating the current use threatens the perceived validity of the techniques from the point of view of the practitioners and their associates. Therefore, it is reasonable to approach the problem from a qualitative/descriptive perspective and look for patterns. Exploratory factor analysis will allow us to look for those patterns or, more importantly, the elements that are most responsible for variance in the data.

Factor Analysis

Variance is a statistical concept that is used to describe differences in measurements. So in one sense, variance may represent all the different heights that people have in a given sample at a point in time. In another sense, variance can represent how an individual's height changes over some extended period of time. In either case, multiple measurements are being taken. These measurements differ and, in turn, result in variance. In simple correlation studies, two variables are studied to see whether they vary concomitantly with each other. Another way of describing that ‘varying with each other’ relationship would be to refer to the relationship as shared variance or even variance overlap. In all cases, the correlation is representing the relationship regarding how when one measure changes, or varies, the other measure changes as well. Factor analysis in general allows the researcher to look at variance overlap (shared variance) between multiple variables at the same time.

In the case of the SI study, there are several pre- and post-assessment measures that will be used. Most of these measures are norm-referenced/standardized motor and language assessments. One of the measures is a developmental scale, the Milestones Scale. In addition to these measures, the use of goal attainment scales which allow for moving what would otherwise be a nominal judgment of met/not met into an interval representation that represents positive/negative and no change outcomes. The type of SI intervention (these vary from brushing the arms, compressing joints, or ‘vestibular activities’ such as spinning in a chair) along with the frequency and duration of use of these activities will also be tabulated.

Adding additional variance is the fact that several different therapists will also be employing these techniques. While, ideally, variance from therapists should be minimized, there is something to be said for personal interactions and styles that cannot easily be quantified. In other words, it is important to acknowledge that individual therapists may differ as much from each other as any particular therapy techniques differ.

Running multiple pair-wise (one-to-one) correlations may be worthwhile, but there is a statistical reason why this is not always desired, especially, as the number of pair-wise comparisons increases. Essentially, the probability of obtaining a result that appears to be significant when it is not increases with each comparison. As mentioned previously, pair-wise comparisons don't allow the researcher to observe how multiple variables correlate in groups, and hence the need for multivariate factor analysis.

In any factor analysis, each subject (case) and all of the associated variables (measures of the various cases) are plotted in a multi-dimensional space, for example, simple correlations with two variables would define a two-dimensional space, three variables, three dimensions, and so on. It is relatively easy to visualize two- or even three-dimensional spaces, but higher order spaces are difficult to comprehend. So, consider a three-dimensional space, if we were looking for a relationship between the three variables, there would be some line or plane that could be drawn that would best explain the shared variance between the variables. This line or plane is what is then known as a factor. There may even be two or three orthogonal solutions to obtain this factor.

Statisticians have looked at these variations and created a technique to determine which variation to use. This technique is called rotation. In very simplified terms, there are several types of rotation used for creating factor solutions. Two of the most common are called varimax and equimax.

In a varimax rotation, factors are calculated to maximize contrast between the factors. That is, elements that are closely related will appear (the correct term is load on, i.e. correlate with) the same factor, while elements that are not closely related will load on different factors. For example, two tests of language would be expected to load on the same factor. While the language tests will be expected to load on the same factor, the tests of motor ability may be expected to load on a different factor that is only minimally if at all related to the language factor. In a varimax rotation, the solution sought gives the highest possible loading of each variable on just one factor and no correlation is allowed between any pair of factors. At most with three variables, three uncorrelated factors can be found. On the other hand, an equimax rotation calculates factors where each factor accounts for the same amount of variance as the other factors in the solution, but, again, the factors remain uncorrelated with each other. Since we are interested in patterns of relationships, the varimax rotation is generally preferred in the sort of exploratory factoring we are interested in. See Table 1 for an example of a fictitious factor solution.

Table 1. Sample factor analysis matrix.

None

In this example, there are five variables that resulted in a three-factor solution. In other words, there are three relationships that emerged from the analysis. Each factor has an associated eigenvalue, which is used as an estimate of the maximum number of variables that can be accounted for by that factor. If a factor has an eigenvalue of less than 1, that means that any one variable explains more than that factor, so those factors are not included in any further analyses.

In the example in Table 1, the factors are labeled Factor 1, Factor 2, and Factor 3. Generally, factors are named according to what types of variables are associated with that factor. In each row, a variable has a number associated with a factor. That number is called a loading, and when describing the relationship, researchers say ‘a variable loads on a factor’. So in the hypothetical example used previously, regarding the language and motor tests, the factor with the language test loadings might be named language abilities, and the factor with the motor assessments loading on it might be named motor abilities. A statistics program will not name the factors. The researcher names the factors on the basis of the results.

Factor loadings are actually correlations. The closer to +1 or −1, the stronger the relationship, where a positive number indicates a positive relationship between the variable and the factor, and a negative number illustrates a negative relationship. In Table 1, saying Variable 1 has a .7 loading on Factor 1 indicates a substantial correlation with that factor. However, Variable 1 only loads at .2 and .1 on Factors 2 and 3, respectively, so is hardly related to them at all though it has nearly 50% of its variance in common with Factor 1.

Once a researcher has computed a factor matrix, each variable is examined for what factor or factors it loads on. Then each factor is examined to see what variables have loaded on the factor. In the fictitious analysis presented in Table 1, Variables 1 and 2 loaded on Factor 1, Variables 3 and 4 loaded on Factor 2, and Variable 5 loaded on Factor 3. When describing these results, a researcher would say that Factor 1 is best defined by Variables 1 and 2; Factor 2, by Variables 3 and 4; and Factor 3, by Variable 5.

For our purposes here, and for most purposes in exploratory analyses, the amount of variance explained is particularly important. In the hypothetical example, Factor 1 has a % variance of .3. This means that approximately 30% of the entire data set could be accounted for by Variable 1 and Variable 2. In matrixes that contain a large number of variables, finding a subset of those variables that account for most of the variance in the data set is often desirable. In fact, that is one of the purposes of the SI study described previously: to take a large number of variables and reduce that number to those that account for the majority of the variance in the data set.

Factor Analysis in the SI Study

In the SI study, performance as judged by the goal attainment scales is of primary interest. The question at hand is how does performance on the goal attainment scales vary in relation to existing abilities documented by different instruments (where each instrument is a unique variable), number of SI prompts, type of SI prompts, duration of SI prompts, and clinician conducting therapy? The factor matrix will look something like what may be seen in Table 2.

Table 2. Hypothesized SI study factor analysis matrix.
None

This is an exploratory method and is not intended to test or confirm any hypotheses. Hypotheses may be generated, but the primary purpose is to see what varies with the goal attainment scales. Ideally, the goal attainment scales will load on Factor 1 where the majority of the variance is accounted for. There may be more than three factors that emerge or there may be only two. However, we will not know until the data are collected and analyzed. Ideally, once several relationships can be established, then more focused experimentation and hypothesis testing may begin.

It should be noted that variables to be studied were selected on the basis of known phenomena. It is reasonable to expect some variables to load on the same factor even before the analysis is run, because the choosing of variables should be driven by logical theory-based reasoning.

The major advantage of starting the investigation this way is that therapist behaviors are not being directly manipulated. When therapists perceive ‘good’ results, manipulations of their behaviors in more rigorous studies are seen as potential causes of failures to empirically demonstrate those good results. However, taking this more qualitative approach can subdue complaints regarding artificial or contrived therapy conditions. It also subdues complaints that the data are overly subjective or open to countless interpretations. In a sense, exploratory factor analysis offers the advantages of qualitative research as well as quantitative research in a single package.

Conclusion

In summary, exploratory factor analysis can be a useful tool for examining complex relationships that are not easily understood with pair-wise comparisons, or less powerful descriptive statistics, but require more data-driven descriptions than may be acceptable with traditional qualitative methods. This is especially true in fields that subscribe to EBP doctrine, where case studies and descriptive research are often not viewed favorably.

Factor analysis in general is limited in that interval or rational data are preferred over nominal or ordinal data, but this limitation may be overcome by counting occurrences of otherwise nominal events. Nevertheless, researchers must use caution. The other major limitation in exploratory factoring is that as the number of variables increases, the number of cases (participants) must also increase. There are several viewpoints regarding the minimum number of cases. One approach is to require a certain (minimum) number of cases per variable (with considerably more cases than variables). Another approach argues that regardless of the number of variables (i.e. even in cases of only 3 or 4), a minimum number of cases is required. A consensus approach, sometimes recommended, is to find several published studies in the field of interest using a similar number of variables to your own study and then use the same number of cases (or more) as those studies.

Finally, keep in mind that any factor analysis needs to be interpreted on the basis of a reasonable theoretical perspective. Since we are here thinking of an exploratory method, the purpose is to help shape critical examination of future questions.

Exercises and Discussion Questions

  • Why is it difficult to use exploratory factor analysis to test hypotheses?
  • Under what conditions would an ANOVA or pair-wise correlation be more appropriate than an exploratory factor analysis?
  • Find a paper in your discipline that used factor analysis as a research tool. Look at the factor matrix and pay close attention to the factor names. Do you agree with the names the author(s) chose to describe the factors? Provide three reasons to justify your answer.
  • Think of a research question you may have. Would exploratory factor analysis be an appropriate investigational tool? Provide three reasons to justify your answer.
  • Find three published articles in your discipline that used factor analysis. How many cases were used in each study? Did the author(s) use an overall minimum number of cases approach, a minimum number of cases per variable, or did the author(s) specify how they determined the appropriate number of cases to use?

Further Reading

Borsboom, D. (2005). Measuring the mind: Conceptual issues in contemporary psychometrics. Cambridge, UK: Cambridge University Press. (Author's note: One of the best books ever written on measurement, particularly with respect to validity and reliability.) http://dx.doi.org/10.1017/CBO9780511490026
Mailloux, Z., May-Benson, T. A., Summers, C. A., Miller, L. J., Brett-Green, B., Burke, J. P.. SchoenS. A. (2007). Goal attainment scaling as a measure of meaningful outcomes for children with sensory integration disorders. American Journal of Occupational Therapy, 61, 254–259. doi: http://dx.doi.org/10.5014/ajot.61.2.254http://dx.doi.org/10.5014/ajot.61.2.254

References

Ayres, A. J. (1972). Sensory integration and learning disorders. Los Angeles, CA: Western Psychological Services.
Gorsuch, R. L. (1983). Factor analysis (
2nd ed.
). Hillsdale, NJ: Lawrence Erlbaum Associates.
Oller, J. W., Jr., & Chen, L. (2007). Episodic organization in discourse and valid measurement in the sciences. Journal of Quantitative Linguistics, 14, 127–144. doi: http://dx.doi.org/10.1080/09296170701379336http://dx.doi.org/10.1080/09296170701379336
Oller, J. W., Oller, S. D., & Oller, S. N. (2012 publisher's date 2014). Milestones: Normal speech and language development across the life span (
2nd ed.
). San Diego, CA: Plural Publishing.
Oller, S. D. (2012). Assessing autism spectrum disorders using the individual's representational capacity: Reliability based on student and professional ratings. Texas A&M University (Kingsville Faculty Senate Annual Faculty Lecture Series). Retrieved from http://www.tamuk.edu/artsci/csdo/facultypages/Stephen%20Oller_2012%20Faculty%20Lecture.pdf
Ratner, N. B. (2006). Evidence-based practice: An examination of its ramifications for the practice of speech-language pathology. Language, Speech & Hearing Services in Schools, 37, 257–267. doi: http://dx.doi.org/10.1044/0161-1461
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles