You can preview and download the dataset from this tab. The dataset is available in multiple file formats, compatible with most common software packages. You can also view and download the Codebook, which provides information on the structure, contents, and layout of the dataset.
This dataset is designed for teaching a topic modeling technique called Latent Dirichlet Allocation (LDA), which is used to find latent topic structures in text data. The dataset is a subset of data derived from the 2016 News Articles dataset, and the example investigates the topics discussed in the news articles in an automated fashion. The dataset file is accompanied by a Teaching Guide, a Student Guide, and a How-to Guide for Python.