1

Given that I want to calculate the distance of a vector x (say from the blue distribution) from the centroid of a different distributions than x's - say centroid of the red vectors:

enter image description here

I want to use the Mahalanobis distance:

$M^2 = (x-\mu)^T \Sigma^{-1}(x-\mu)$

Is the covariance matrix $\Sigma^{-1}$ than calculated only from the distribution of the here red vectors? If so, how is the distribution of the blue datapoints considered?

1 Answers1

1

It depends on what your model for this data is. Judging from the image you provided, a Gaussian Mixture Model is applied, that is we have

$$x\mid A \sim \mathcal N(\mu_A, \Sigma_A) \qquad x\mid B\sim\mathcal N(\mu_b, \Sigma_B)$$

and in total $p(x) = p(x\mid A)p(A) + p(x\mid B)p(B)$. One commonly writes $\pi_A$ and $\pi_B$ for $p(A)$ and $p(B)$. You can now either

  1. Consider the case when both populations have the same variance $\Sigma =\Sigma_A=\Sigma_B$. This is known as Linear Discriminant Analysis (LDA). In you can either
    • estimate $\Sigma$ by averaging the population covariances, weighted by frequency: $\hat\Sigma=\hat\pi_A\hat\Sigma_A + \hat\pi_B\hat\Sigma_B$
    • estimate $\Sigma$ using the pooled variance technique
    • In this case, the Mahalabonis distances from a point to distribution $B$ is $(x-\mu_B)^T\Sigma^{-1}(x-\mu_B)$. Here, the blue distribution would have an effect on the distance, since $\Sigma$ is estimated using data from both populations.
  2. Assume both populations have distinct covariances $\Sigma_A\neq\Sigma_B$. This is known as Quadratic Discriminant Analysis (QDA)
    • In this case, the Mahalabonis distances from a point to distribution $B$ is $(x-\mu_B)^T\Sigma_B^{-1}(x-\mu_B)$. Here, the blue distribution would have no effect on the distance, since $\Sigma_B$ is estimated using only data from the red population.
Hyperplane
  • 12,204
  • 1
  • 22
  • 52