Skip to main content icon/video/no-internet

The notion of centroid generalizes the notion of a mean to multivariate analysis and multidimensional spaces. It applies to vectors instead of scalars, and it is computed by associating to each vector a mass that is a positive number taking values between 0 and 1 such that the sum of all the masses is equal to 1. The centroid of a set of vectors is also called the center of gravity, the center of mass, or the barycenter of this set.

Notations and Definition

Let ν be a set of I vectors, with each vector being composed of J elements:

None

To each vector is associated a mass denoted mi for vector i. These masses take values between 0 and 1, and the sum of these masses is equal to 1. The set of masses is a vector denoted m. The centroid of the set of vectors is denoted c, defined as

None

Examples

The mean of a set of numbers is the centroid of this set of observations. Here, the mass of each number is equal to the inverse of the number of observations: mi = 1_I.

For multivariate data, the notion of centroid generalizes the mean. For example, with the following three vectors,

None

and the following set of masses,

None

we obtain the following centroid:

None

In this example, if we plot the vectors in a two-dimensional space, the centroid would be the center of gravity of the triangle made by these three vectors with masses assigned proportionally to their vector of mass. The notion of centroid can be used with spaces of any dimensionality.

Properties of the Centroid

The properties of the centroid of a set of vectors closely parallel the more familiar properties of the mean of a set of numbers. Recall that a set of vectors defines a multidimensional space, and that to each multidimensional space is assigned a generalized Euclidean distance. The core property of the centroid is that the centroid of a set of vectors minimizes the weighted sum of the generalized squared Euclidean distances from the vectors to any point in the space. This quantity that generalizes the notion of variance is called the inertia of the set of vectors relative to their centroid.

Of additional interest for multivariate analysis, the theorem of Huyghens indicates that the weighted sum of the squared distances from a set of vectors to any vector in the space can be decomposed as a weighted sum of the squared distances from the vectors to their centroid plus the (weighted) squared distance from the centroid to this point. In term of inertia, Huyghens's theorem states that the inertia of a set of vectors to any point is equal to the inertia of the set of vectors to their centroid plus the inertia of their centroid to this point. As an obvious consequence of this theorem, the inertia of a set of vectors to their centroid is minimal.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading