Skip to main content icon/video/no-internet

Computer Programming in Quantitative Analysis

Computer programming in quantitative analysis refers to the process of creating computer “code”—or instructions that a computer can interpret—to automate quantitative summaries of data. Due to the advent of powerful personal computers, high-level programming languages, and the increasing availability of high-performance computing clusters, such programming is becoming increasingly used and important in both publicly funded research and private and commercial settings. Such computer programming may take many different forms depending on the purpose of the quantitative analysis. This entry provides an overview of particular use cases for computer programming—progressing from the most basic to the most complex—integrated with the introduction of programming concepts.

Basic Use Cases

Applications of computer programming vary widely depending on the purpose of the analyses and on the experience of the programmer. Writing or recording the code used for quantitative analyses can be important for the ability to reproduce an analysis, automate a series of analyses or a simulation study, or develop and test a new quantitative analysis technique.

Reproducible Analyses

Suppose a researcher generates histograms for each of two variables (X and Y) and performs a regression analysis (Y regressed on X). If questions arise about the analysis, the researcher may not always remember the bin size used to construct the histograms, whether the predictor variable was standardized when they performed the regression analysis, and so on. A record of the analysis will make it possible to recall exactly how the analysis was conducted without relying on fallible human memory. Analyses that are recorded are called reproducible analyses.

Statistical software designed for novice analysts, typically operated by a point-and-click user interface, do not necessarily retain an exact record of the analyses performed. However, it is often possible to persuade such software to produce the underlying code. For example, SPSS can produce “syntax,” which consists of code that can be saved as a plain text file record. This record allows one to see exactly which options were enabled or selected when the analyses were run, even if not immediately discernible to the untrained eye. The act of creating a record of instructions that can be replayed is a rudimentary example of programming. Once an analysis script is available, it is a small step for the user to edit this code by copying and pasting or changing a few variable names. Other popular all-purpose statistical packages, such as R, SAS, and STATA, are also sometimes capable of generating an analysis script from a point-and-click interface.

A typical next step is to copy results into a manuscript or report. Although the user may copy software output manually, it is increasingly possible to integrate the analysis code and narrative text. This approach is known as literate programming and has long been advocated by Donald Knuth, an early computer science visionary. For example, suppose the results of the aforementioned regression analysis are to become part of a publication. With authoring formats such as R Markdown or packages such as knitr, it is possible to combine the R code and manuscript text in the same file. Within an R editor such as Rstudio, a button click will run the R code, combine it with the narrative text, and generate a report that automatically displays the output of the R code and can automate generation of tables, figures, and so on. Reports can be generated in a wide variety of formats including portable document format (in conjunction with LaTeX), Microsoft Word, presentations (e.g., with Beamer), and web pages (HTML). Use of such an approach can reduce transcription errors and mislabeling of output and avoid loss of documentation regarding how the results for a figure or table were generated. The code used to run analyses resides in the same place as the text of the report, and the owner of the file can see exactly what code generated each table, figure, or other in-text values reported, while optionally hiding such code from the report for esthetic reasons. Preparation of such integrated documents requires programming investment but pays off by resulting in reproducible analysis code.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading