You can preview and download the dataset from this tab. The dataset is available in multiple file formats, compatible with most common software packages. You can also view and download the Codebook, which provides information on the structure, contents, and layout of the dataset.
This dataset is designed for teaching the Term Frequency–Inverse Document Frequency (TFIDF) in text analysis. The dataset is a subset of data derived from the 2016 How ISIS Uses Twitter dataset, and the example demonstrates how TFIDF scores reveal words that are representative of a document and distinguish the document from others. The dataset file is accompanied by a Teaching Guide, a Student Guide, and a How-to Guide for Python.