Skip to main content icon/video/no-internet

Scan statistics are used for detecting unusual clusters of points scattered randomly on a line or events occurring randomly over time. Traditionally, the scan statistic is defined as the maximum number of points contained in a window of fixed length w sliding along a continuous interval of real numbers, say, from 0 to 1. The points on the interval may represent events over time. An exceptionally large value for the scan statistic will indicate the presence of a cluster.

In 1965, Joseph Naus showed that the scan statistic is the test statistic in a generalized likelihood ratio test for the null hypothesis of n points independently sampled from a uniform distribution on [0,1] versus an alternative of the existence of a cluster. Naus proved that among a class of tests for the presence of clusters among uniformly distributed points, Nw is the most powerful test statistic. Since then, there have been numerous applications of scan statistics in mathematics, science, and industry, such as the generalized birthday problems, clustering of diseases in time, clustering of defective items in manufacturing processes, and many more.

How large a value must the scan statistic of a cluster have before it can be declared unlikely to occur by chance? To answer this question, one needs a mathematical formula for the probability distribution of the scan statistic. Finding the distribution function of the scan statistic has become an object of intense interest. In 1977, F. K. Hwang derived the general formula for the distribution of the scan statistic. However, the practical use of this elegant formula is severely restricted because of the formidable computations involved in the complicated sum of matrix determinants. Considerable efforts, therefore, have been channeled to finding accurate approximate distributions that are computationally more tractable. A number of these approximations are presented in the book Scan Statistics and Applications by J. Glaz and Balakrishnan, published in 1999.

In applying the scan statistics, choosing an appropriate value of the window size w is usually not a straightforward matter. In many cases, significant clusters may not be detected because the window size is either too small or too big. It is generally advisable to perform the analysis using a variety of window sizes in any application of scan statistics to detect clusters. It has been noticed that the repetitive applications of scan statistics on the computer can be carried out more conveniently if we use an equivalent form of scan statistics known as r-scan, put forth by Amir Dembo and Samuel Karlin at Stanford University in 1992, while the original form of scan statistics is called w-scan. We present the definitions of both the w-scan and the r-scan below with an explanation of their duality relationship.

Let X1, X2,…, Xn be n points independently sampled from the unit interval of real numbers between 0 and 1, and let X(1), X(2),…, X(n) be their order statistics from the smallest to the largest. That is, X(1) is the smallest one among X1, X2,…, Xn, X(2) the second smallest, and so on. Let

None

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading