Averaging over Mahalanobis distance vectors of different clusters

Question

Given vectors from two different clusters (in particular in my example from two different experimental conditions, called "CS" and "US") where the Mahalanobis Distance is calculated according to:

$M^2 = (x-\mu)^T \Sigma^{-1}(x-\mu)$

the explanation reads as follows: "where x and $\mu$ are individual and mean population vectors for CS and US ensemble responses [that is n-dim vectors; my addition], respectively, and $x^T$ and $\mu^T$ are their transposes. We averaged the M over all points x in both sets". The source is here in the methods, under "Population vector analysis"

And the illustration looks like this:

My questions:

As I understand the take one cluster as the reference in turn and then average over vector-by-vector calculations of the mahalanobis distance? Is this a common and correct approach?
Does this lead to a distance measure which is no longer symmetric?

score -2 · Answer 1 · answered Mar 06 '21 at 07:25

Preamble

Whenever all the variables under consideration are expressed in the same unit of measurement and therefore they appear to be directly comparable, the Euclidean scalar product or the associated Euclidean distance is usually used,

$ d^2(\mu,x)\equiv \langle \mu - x,\mu - x \rangle_{eucl}:= \sum_{i=1}^n (\mu_i - x_i)^2 $

Often, the collected data do not have homogeneous units of measurement, for example in the pair of weight-height variables, the unit of measurement of the first variable is mass, while the second variable expresses the measurement of a length. Weight and height cannot be added together, so it is necessary to express the data in dimensionless numbers. If the variables refer to different units of measurement, the Weighted Euclidean Distance is used in order to transforms the observable variables into dimensionless and therefore comparable numbers.The transformation is operated by a suitable multiplier or weight, $\lambda_i$, which makes the observed difference $(\mu_i - x_i)$ of a character directly comparable with the observed difference in one of the other characters $(\mu_j - x_j)$ where $i\neq j$ and $i,j=1,\cdots, n$.

If the measurements of the single variables have non-univocal values, i.e. in probabilistic terms they present non-zero variance, it is common to choose the reciprocal of the standard deviation as the multiplier: $\lambda = 1 / \sigma $ (Mahalanobis transformation), where $\sigma $ designates the standard deviation. The Mahalanobis transformation that operates on vectors by multiplying them by the reciprocal of the corresponding standard deviation $ x \mapsto f (x):= \sigma^{-1}x$ is an isometry with respect to the usual Euclidean scalar product:

$\langle\mu - x,\mu - x\rangle_{mahal} = \langle f(\mu - x),f(\mu - x)\rangle_{eucl}$

If the n variables are statistically independent, therefore uncorrelated, the square of the Mahalanobis distance is written as:

$d^2(\mu - x,\mu - x)_{mahal}:= \sum_{i=1}^n \tfrac{(\mu_i - x_i)^2} {\sigma^2_{ii}} $

where $\sigma^2_{ii}$ is the variance referred to the i-th variable.

Otherwise, if the n variables are correlated, the square of the Mahalanobis distance is written as

$d^2(\mu - x,\mu - x)_{mahal}:= \sum_{i=1}^n \sum_{j=1}^n (\mu_i - x_i) \cdot \sigma^{-2}_{ij} \cdot (\mu_j - x_j) $,

where $\sigma^{2}_{ij} $ is the element of the inverse matrix of the variance-covariance matrix $\Sigma$.

In short

$M=(\mu - x)^t \Sigma^{-1} (\mu - x) $

Question 1

Once a distance $d$ able to express the notion of proximity for the observations has been adopted, it is possible to introduce a classification rule for the observations with respect to a representative of each class. The statistical character of the discriminant analysis derives from the fact that the average or the sample mean of the populations under examination is chosen to be the representative of each class.

In the example of the two classes (sets "US" and "CS"), indicated with $ \mu_1$ the average of the first set (population) and with $ \mu_2$ the average of the second population, the observation $x$ belongs to class 1 if and only if:

$d(\mu_1,x) < d(\mu_2,x)$

that is if $x$ is closer to $\mu_1$.

Question 2

Just consider $d^2(\mu,x)=\|\mu-x\|^2=\langle \mu-x , \mu-x \rangle$ to express the rule of belonging to set 1 in terms of the dot product as $\langle \mu_1-x, \mu_1-x\rangle < \langle \mu_2-x, \mu_2-x\rangle$

The relationship just written, exploiting the bilinearity property of the scalar product, can be rewritten after simple steps such as

$\langle \mu_1-\mu_2, x-1/2(\mu_1+\mu_2)\rangle > 0$.

Indicated with $x_0=1/2(\mu_1+\mu_2)$ the classification rule for set 1 elements becomes:

$Set_1: \langle \mu_1-\mu_2, x-x_0 \rangle > 0 $

Thanks, but my question is not about classification at all, so this doesn't answer my questions. As you can also see under the link they use distances as a measure for similarity between the two classes - so the idea is to quantify similarity between two separate classes (and not to classify a point to one of two classes). So my questions stand: 1. Why average over both distances as a measure for similarity? 2. Are the distances they calculate with different means not the same anyway? — Pugl, Mar 06 '21 at 17:30
Dar @Pugl smaller the Distance between vectors $x$ and $\mu$ is, the higher the similarity of vectors $x$ and $\mu$ is: similarity := $S( x ,\mu ) = a / d( x ,\mu ) $ where $a>0$.Assuming that the range of a cluster depends on a set of $n$ indicators (components of each variable), it is possible to compute the mean vector and covariance matrix for these indicators at a training cluster (extinction training_for example). As a result, the Mahalanobis distance can be used as a _similarity measure for these $n$ indicators for a larger cluster. — Tognaz, Mar 07 '21 at 14:13

Averaging over Mahalanobis distance vectors of different clusters

1 Answers1

Preamble

Question 1

Question 2