14

Let $\gamma_n$ denote the standard Gaussian measure on $\mathbb{R}^n$. It is known (see for example Cor 2.3 here: http://www.math.lsa.umich.edu/~barvinok/total710.pdf) that $$\gamma_n\{x\in\mathbb{R}^n: \|x\|^2 \ge \frac{n}{1-\epsilon}\}\ge e^{-\epsilon n/4}$$and $$\gamma_n\{x\in\mathbb{R}^n: \|x\|^2 \le (1-\epsilon)n\}\le e^{-\epsilon n/4}.$$

Given a Gaussian distribution with covariance $\Sigma$, what can we say about the distribution of the norm? That is, what happens if rather than the standard Gaussian we consider the measure induced by $N(\vec{0},\Sigma)$, for some positive semidefinite $\Sigma$?

I think we should get similar bounds along the lines of $$\gamma_n\{x\in\mathbb{R}^n: \|x\|^2 \ge \frac{\operatorname{Trace}(\Sigma)}{1-\epsilon}\}\ge e^{-\epsilon n/4}$$and $$\gamma_n\{x\in\mathbb{R}^n: \|x\|^2 \le (1-\epsilon)\cdot\operatorname{Trace}(\Sigma)\}\le e^{-\epsilon n/4}.$$

I think it should be the trace since Tr$(\frac1n\sum x_i x_i^T) = \frac1n\sum \|x_i\|^2\to\mathbb{E} \|x\|^2$. Is this true? If so, is the derivation straightforward given the standard normal tail bound? It doesn't seem trivial, though often these things are simple modifications of the $N(0,I)$ case.

I believe one can reduce the problem to proving $$\gamma_n\{x\in\mathbb{R}^n: x^t \Sigma^{-1} x \ge \frac{\operatorname{Trace}(\Sigma)}{1-\epsilon}\}\ge e^{-\epsilon n/4}$$and $$\gamma_n\{x\in\mathbb{R}^n: x^t \Sigma^{-1} x \le (1-\epsilon)\cdot\operatorname{Trace}(\Sigma)\}\le e^{-\epsilon n/4},$$ but as the transformed variable leads to a factor of $|\det\Sigma|$, it seems like I'm on the wrong track.

Helmut
  • 507

2 Answers2

6

Note: The following is not an answer, but merely some thoughts which might or might not be helpful to you.

First note that you confused(?) your inequality signs. I think you want $$ \gamma_{n}\left(\left\{ x\in\mathbb{R}^{n}\,\mid\,\left\Vert x\right\Vert ^{2}\geq\frac{n}{1-\varepsilon}\right\} \right){\color{red}\leq}e^{-\varepsilon n/4} $$ and $$ \gamma_{n}\left(\left\{ x\in\mathbb{R}^{n}\,\mid\,\left\Vert x\right\Vert ^{2}\geq\frac{{\rm Trace}\left(\Sigma\right)}{1-\varepsilon}\right\} \right){\color{red}\leq}e^{-\varepsilon n/4}. $$ Also note that this inequality would get better with larger values of $n$. But in general, this is not true. To see this, use e.g. $$ \Sigma=\left(\begin{matrix}1\\ & 0\\ & & \ddots\\ & & & 0 \end{matrix}\right), $$ or if you want your $\Sigma$ to be positive semidefinite, use $\frac{1}{L\left(n-1\right)}$ instead of the zeros on the diagonal, where $L$ is large. Your estimate would then imply (since $\left\Vert x\right\Vert ^{2}\geq\left|x_{1}\right|^{2}$) that $$ \mathbb{P}\left(\left|x_{1}\right|^{2}\geq\frac{1+\frac{1}{L}}{1-\varepsilon}\right)\leq\mathbb{P}\left(\left\Vert x\right\Vert ^{2}\geq\frac{1+\frac{1}{L}}{1-\varepsilon}\right)\leq e^{-\varepsilon n/4}\xrightarrow[n\to\infty]{}0, $$ which is absurd.

Hence, the (exponent of the) right hand side of your estimate somehow needs to involve ${\rm trace}\left(\Sigma\right)$ instead of $n$ (I think).



What follows is an adaptation of the argument you linked, but I get eventually stuck when I try to optimize the/find a good value of $\lambda$.

First, since $\Sigma$ is symmetric positive semidefinite, there is an orthogonal matrix $O\in\mathbb{R}^{n\times n}$ with $\Sigma=O \cdot {\rm diag}\left(\lambda_{1},\dots,\lambda_{n}\right)\cdot O^{T}$, where $\lambda_{1},\dots,\lambda_{n}\geq0$ are the eigenvalues of $\Sigma$. We can now define the square root $\sqrt{\Sigma}:=O\cdot {\rm diag}\left(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}}\right) \cdot O^T\in\mathbb{R}^{n\times n}$ which satisfies $\sqrt{\Sigma}^{T}=\sqrt{\Sigma}$ and $\sqrt{\Sigma}\sqrt{\Sigma}=\Sigma$. Now, by well-known properties of the normal distribution, we conclude that $X:=\sqrt{\Sigma}g\sim N\left(0,\Sigma\right)$, where $g\sim N\left(0,{\rm id}\right)$ is a standard normal distributed random variable.

We also know that the standard normal distribution is invariant under orthogonal transformations, i.e. $h:=O^{T}g\sim N\left(0,{\rm id}\right)$. Finally, $$ \left\Vert X\right\Vert ^{2}=\left\Vert O{\rm diag}\left(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}}\right)O^{T}g\right\Vert ^{2}=\left\Vert {\rm diag}\left(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}}\right)h\right\Vert ^{2}=\sum_{i=1}^{n}\lambda_{i}h_{i}^{2}, $$ so that $\left\Vert X\right\Vert ^{2}$ has (as you noted yourself) expectation $$ \mathbb{E}\left\Vert X\right\Vert ^{2}=\sum_{i=1}^{n}\lambda_{i}\mathbb{E}h_{i}^{2}=\sum_{i=1}^{n}\lambda_{i}={\rm trace}\left(\Sigma\right), $$ since $\mathbb{E}h_{i}^{2}={\rm Var}\left(h_{i}\right)=1$, since $h\sim N\left(0,{\rm id}\right)$.

By reordering, we can assume $\lambda_{1}\geq\dots\geq\lambda_{j}>0=\lambda_{j+1}=\dots=\lambda_{n}$, where $j\in\left\{ 0,\dots,n\right\} $.

Now observe that the Markov/Chebyscheff inequality yields, for arbitrary $\lambda>0$, \begin{eqnarray*} \mathbb{P}\left(\left\Vert X\right\Vert ^{2}\geq{\rm trace}\left(\Sigma\right)+\delta\right) & = & \mathbb{P}\left(e^{\lambda\left\Vert X\right\Vert ^{2}}\geq e^{\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\right)\\ & \leq & e^{-\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\cdot\mathbb{E}\left(e^{\lambda\left\Vert X\right\Vert ^{2}}\right), \end{eqnarray*} where \begin{eqnarray*} \mathbb{E}\left(e^{\lambda\left\Vert X\right\Vert ^{2}}\right) & = & \mathbb{E}\left(e^{\sum_{i=1}^{n}\lambda\lambda_{i}h_{i}^{2}}\right)\\ & = & \prod_{i=1}^{j}\mathbb{E}\left(e^{\lambda\lambda_{i}h_{i}^{2}}\right), \end{eqnarray*} by stochastic independence of $\left(h_{1},\dots,h_{n}\right)$. The main point of the introduction of the $e^{\dots}$ term is this final identity, where we can pull the product out of the expectation by independence.

Finally, \begin{eqnarray*} \mathbb{E}\left(e^{\gamma h_{i}^{2}}\right) & = & \frac{1}{\sqrt{2\pi}}\cdot\int_{\mathbb{R}}e^{\gamma x^{2}}\cdot e^{-x^{2}/2}\,{\rm d}x\\ & = & \frac{1}{\sqrt{2\pi}}\cdot\int_{\mathbb{R}}e^{-\left(\sqrt{\frac{1}{2}-\gamma}x\right)^{2}}\,{\rm d}x\\ & \overset{\omega=\sqrt{\frac{1}{2}-\gamma}x}{=} & \frac{1}{\sqrt{2\pi}\cdot\sqrt{\frac{1}{2}-\gamma}}\cdot\int_{\mathbb{R}}e^{-\omega^{2}}\,{\rm d}\omega\\ & = & \frac{1}{\sqrt{1-2\gamma}} \end{eqnarray*} for $\gamma<\frac{1}{2}$.

All in all, we arrive at $$ \mathbb{P}\left(\left\Vert X\right\Vert ^{2}\geq{\rm trace}\left(\Sigma\right)+\delta\right)\leq e^{-\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\cdot\prod_{i=1}^{j}\frac{1}{\sqrt{1-2\lambda\lambda_{i}}}. $$ The problem is now to optimize this w.r.t. $0<\lambda<\frac{1}{2\lambda_{1}}$. One way to simplify(?) this is to use $$ e^{-\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\cdot\prod_{i=1}^{j}\frac{1}{\sqrt{1-2\lambda\lambda_{i}}}=e^{-\left[\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)-\frac{1}{2}\sum_{i=1}^{j}\ln\left(1-2\lambda\lambda_{i}\right)\right]}, $$ where one only has to optimize the exponent. Still, I neither see an easy way to determine the optimal value of $\lambda$, nor a really convenient choice of $\lambda$.

One choice inspired by your linked lecture notes is to use $\lambda=\frac{\delta/2}{{\rm trace}\left(\Sigma\right)+\delta}$ (because in the standard gaussian case, we have $n={\rm trace}\left(\Sigma\right)$, which is exactly the choice used in the lecture notes). This would yield \begin{eqnarray*} \mathbb{P}\left(\left\Vert X\right\Vert ^{2}\geq{\rm trace}\left(\Sigma\right)+\delta\right) & \leq & e^{-\delta/2}\cdot\prod_{i=1}^{j}\sqrt{\frac{{\rm trace}\left(\Sigma\right)+\delta}{{\rm trace}\left(\Sigma\right)+\delta-\delta\lambda_{i}}}, \end{eqnarray*} which still does not really seem that great.

I will try to find a good choice of $\lambda$ here. If I come up with something, I will edit the post.

PhoemueX
  • 36,211
0

So, as @PhoemueX showed it would be sufficient to obtain a bound for $\sum_{i=1}^n \lambda_i \xi_i^2$, where $\xi_i$s are independent and $\xi_i \sim \mathcal{N}(0, 1)$.

Let $\xi, \xi_1, \dots, \xi_n$ be standard Gaussian random variables and $Z = \sum_{i=1}^p a_i (\xi_i^2 - 1)$ for any vector $a \in \mathbf{R}^p_+$, then it holds \begin{align} \mathbb{P}(Z \ge 2 \| a \|_2 \sqrt{x} + \| a \|_{\infty} x) \le e^{-x}. \end{align}

Proof.

The logarithm of Laplace transform of $X^2 - 1$ reads as follows \begin{align*} \psi(u) = \log \left[\mathbb{E} \exp\{ u(X^2 - 1)\}\right] = -u - \frac 1 2 \log (1 - 2u). \end{align*} Notice that for $u \in \left(0, \frac 1 2 \right)$ we have $\psi(u) \le \frac{u^2}{1 - 2u}$. Hence, \begin{align} \log \mathbb{E} e^{uZ} &= \sum_{i=1}^p \log \mathbb{E} \exp \{ a_i u (X_i^2 - 1) \} \le \sum_{i=1}^p \frac{a_i^2u^2}{1 - 2a_i u} \nonumber \\ &\le \sum_{i=1}^p \frac{2a_i^2 u^2}{2(1 - 2\max_i a_i \cdot u)} \le \frac{2u^2 \| a \|_2^2}{2(1 - 2 \| a \|_{\infty} u)} \end{align}

Lemma. If \begin{align*} \log \mathbb{E} e^{uZ} \le \frac{v u^2}{2(1 - cu)} \end{align*} then \begin{align*} \mathbb{P}(Z \ge cx + \sqrt{2vx}) \le e^{-x}. \end{align*}

Proof of lemma.

Markov's inequality yields \begin{align*} \mathbb{P}(Z \ge cx + \sqrt{2vx}) &\le \mathbb{E} \exp \left\{ \lambda Z - \lambda (cx + \sqrt{2vx}) \right\} \nonumber \\ &\le \exp\left\{ \frac{v\lambda^2}{2(1 - c\lambda)} - \lambda(cx + \sqrt{2vx})\right \} \nonumber \\ &= \exp\left\{ \frac{v\lambda^2}{2(1 - c\lambda)} + x(1 - c\lambda) - \lambda \sqrt{2vx} - x \right \} \nonumber \\ &= \exp \left\{-x\right\}, \end{align*} where in the last step $\lambda$ was set as follows $$ \lambda = \frac{\sqrt{2x}}{\sqrt{v} + c\sqrt{2x}} \in \mathbf{R}_{++} \in \left( 0, \frac 1 2 \right). $$

$\blacksquare$

Now applying the previous result with $c=2\| a \|_{\infty}$ and $v = 2 \| a \|_2^2$ we get \begin{align*} \mathbb{P}(Z \ge 2\| a \|_{\infty}x + 2 \| a \|_2 \sqrt{x}) \le e^{-x}, \end{align*} as desired. $\blacksquare$


Also, there is a similar lower bound which I'm presenting, but not prove it here for the sack of not deviating from OP's question. So, with the same notations it holds $$ \mathbb{P}(Z \le - 2 \| a \|_2 \sqrt{x}) \le e^{-x}. $$

tortue
  • 2,261