Challenges in Conducting a Multinomial Logistic Regression Analysis on Regional Secondary Data of the Flemish Colorectal Cancer Screening Program

Abstract

This case study will go into detail about the novel approach of population-based colorectal cancer (CRC) screening based on “personal” risk. While the screening finds a lot of precursor lesions and CRCs, the current approach also misses many of the precursor lesions (false negative) and detects many normal outcomes (false positive). This has a large economic impact which could be solved by a more sophisticated approach based on a risk stratification. We used a multinomial multivariate logistic regression model to predict the outcome of a normal result, precursor lesions, in situ, and CRC based on several determinants of CRC. This model was chosen as we had four outcome variables and not only two, which is common in our field (disease/no disease or death/alive). As we worked with high-quality secondary data (due to standardization), we did not have influence on the data when we started the research project. There are several topics that should be considered when using this model in practice. Explanations are provided for problems such as the sample size, data interpretation, and empty cell problem. As we worked with the statistical program RStudio, some of the code will be provided to obtain a better understanding of the model. One should also consider all other model options such as the ordinal regression model or the neural network analysis.

locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles