My textbook says the following:
Given a vector $\mathrm{\mathbf{x}}$ of random variables $x_i$ for $i = 1, \dots, N,$ with mean $\bar{\mathrm{\mathbf{x}}} = E[\mathrm{\mathbf{x}}]$, where $E[\cdot]$ represents the expected, and $\Delta \mathrm{\mathbf{x}} = \mathrm{\mathbf{x}} - \bar{\mathrm{\mathbf{x}}}$, the covariance matrix $\Sigma$ is an $N \times N$ matrix given by
$$\Sigma = E[\Delta \mathrm{\mathbf{x}} \Delta \mathrm{\mathbf{x}}^T]$$
so that $\Sigma_{i j} = E[ \Delta x_i \Delta x_j]$. The diagonal entries of the matrix $\Sigma$ are the variances of the individual variables $x_i$, whereas the off-diagonal entries are the cross-covariance values.
The variables $x_i$ are said to conform to a joint Gaussian distribution, if the probability distribution of $\mathrm{\mathbf{x}}$ is of the form
$$P(\bar{\mathrm{\mathbf{x}}} + \Delta \mathrm{\mathbf{x}}) = (2 \pi) ^{-N/2} \det(\Sigma^{-1})^{1/2} \exp(-(\Delta \mathrm{\mathbf{x}})^T \Sigma^{-1} (\Delta \mathrm{\mathbf{x}})/2) \tag{A2.1}$$
for some positive-semidefinite matrix $\Sigma^{-1}$.
$\vdots$
Change of coordinates. Since $\Sigma$ is symmetric and positive-definite, it may be written as $\Sigma = U^TDU$, where $U$ is an orthogonal matrix and $D = (\sigma_1^2, \sigma_2^2, \dots, \sigma_N^2)$ is diagonal. Writing $\mathrm{\mathbf{x}}' = U \mathrm{\mathbf{x}}$ and $\bar{\mathrm{\mathbf{x}}}' = U \bar{\mathrm{\mathbf{x}}}$, and substituting in (A2.1), leads to
$$ \begin{align*}\exp(-(\mathrm{\mathbf{x}} - \bar{\mathrm{\mathbf{x}}})^T \Sigma^{-1} (\mathrm{\mathbf{x}} - \bar{\mathrm{\mathbf{x}}})/2) &= \exp(-(\mathrm{\mathbf{x}}' - \bar{\mathrm{\mathbf{x}}}')^T U \Sigma^{-1} U^T (\mathrm{\mathbf{x}}' - \bar{\mathrm{\mathbf{x}}}')/2) \\ &= \exp(-(\mathrm{\mathbf{x}}' - \bar{\mathrm{\mathbf{x}}}')^T D^{-1} (\mathrm{\mathbf{x}}' - \bar{\mathrm{\mathbf{x}}}')/2) \end{align*}$$
Thus, the orthogonal change of coordinates from $\mathrm{\mathbf{x}}$ to $\mathrm{\mathbf{x}}' = U \mathrm{\mathbf{x}}$ transforms a general Gaussian PDF into one with diagonal covariance matrix. A further scaling by $\sigma_i$ in each coordinate direction may be applied to transform it to an isotropic Gaussian distribution. Equivalently stated, a change of coordinates may be applied to transform Mahalanobis distance to ordinary Euclidean distance.
Appendix 2, Multiple View Geometry in Computer Vision by Hartley and Zisserman.
I'm having trouble understanding the following section:
Thus, the orthogonal change of coordinates from $\mathrm{\mathbf{x}}$ to $\mathrm{\mathbf{x}}' = U \mathrm{\mathbf{x}}$ transforms a general Gaussian PDF into one with diagonal covariance matrix. A further scaling by $\sigma_i$ in each coordinate direction may be applied to transform it to an isotropic Gaussian distribution. Equivalently stated, a change of coordinates may be applied to transform Mahalanobis distance to ordinary Euclidean distance.
It says that the orthogonal change of coordinates from $\mathrm{\mathbf{x}}$ to $\mathrm{\mathbf{x}}' = U \mathrm{\mathbf{x}}$ transforms a general Gaussian PDF into one with diagonal covariance matrix. But in the final expression, we have $D^{-1}$, whereas, if I'm not mistaken, the diagonal covariance matrix is $D = (\sigma_1^2, \sigma_2^2, \dots, \sigma_N^2)$; so $D^{-1}$ is not the diagonal covariance matrix, but the inverse of it. So how is it that the orthogonal change of coordinates transforms a general Gaussian PDF into one with diagonal covariance matrix? Isn't it the case that the orthogonal change of coordinates transforms a general Gaussian PDF into one with the inverse of the diagonal covariance matrix?
It says that further scaling by $\sigma_i$ in each coordinate direction may be applied to transform it to an isotropic Gaussian distribution. My search for information regarding what an isotropic Gaussian distribution is led me to this question, where it is stated that an isotropic Gaussian distribution is one where the covariance matrix is represented by the simplified matrix $\Sigma = \sigma^2 I$. Again, how does scaling $\exp(-(\mathrm{\mathbf{x}}' - \bar{\mathrm{\mathbf{x}}}')^T D^{-1} (\mathrm{\mathbf{x}}' - \bar{\mathrm{\mathbf{x}}}')/2)$ by $\sigma_i$ transform the general Gaussian PDF into an isotropic Gaussian distribution? I don't see where the $\Sigma = \sigma^2 I$ would come from?
I know that the Mahalanobis distance is $|| \mathrm{\mathbf{X}} - \mathrm{\mathbf{Y}}||_{\Sigma} = ((\mathrm{\mathbf{X}} - \mathrm{\mathbf{Y}})^T \Sigma^{-1}(\mathrm{\mathbf{X}} - \mathrm{\mathbf{Y}}))^{1/2}$, but it doesn't seem like this is the same as any of the expressions above (although, it is obviously similar)? And where is the Euclidean distance that is mentioned? My research came across the Euclidean distance matrix, but I also do not see how this is a part of any of the above expressions?
I would greatly appreciate it if people could please take the time to clarify these points.