7

For a standard normal random variable $X \sim \mathcal{N}(0,1)$, we have the simple upper-tail bound of $$\mathbb{P} (X > x) \leq \frac{1}{x \sqrt{2\pi}} e^{-x^2 / 2}$$ and thus from this we can deduce the general upper-tail bound for $X' \sim \mathcal{N}(\mu, \sigma^2)$ to be $$\mathbb{P}(X' > x) = \mathbb{P}\left(X > \frac{x - \mu}{\sigma}\right) \leq \frac{\sigma}{(x - \mu)\sqrt{2 \pi}} e^{-(x - \mu)^2/(2\sigma^2)}$$

How can this type of exponential decay bound be generalized to a $d$-dimensional multivariate normal distribution $\vec{X}$ with mean $\vec{\mu}$ and covariance matrix $\Sigma$? Specifically, can we bound the probability $$\mathbb{P}(\|\vec{X} - \vec{\mu}\| > x)$$ My guess is that we can probably get a a bound that is exponentially small in $x^2$, but how exactly does $\Sigma$ figure in? The more concise the description, the better, e.g. I would prefer a bound that only depends on certain eigenvalues of $\Sigma$ to one that depends on all the entries of $\Sigma$. Even better would be a bound only depending on $\|\Sigma\|$ (for some suitable norm).

I took a look at this post, but my question is simpler as I don't want a bound on each component of my multivariate Gaussian. I suspect there should be a simpler bound (that does not depend on individual matrix elements) to my question than what is given in the answer to the question I linked.

Thanks!

paulinho
  • 6,730
  • Hi Oliver, thanks for the response. Could you clarify your notation a bit? What are $g$ and $\phi_{0, I}$? – paulinho Apr 16 '21 at 00:57
  • This would lead to an integral of the form $$\int_{a}^\infty r^k e^{-r^2 / 2} dr$$ no? Unfortunately, I don't quite see an obvious bound on this integral - can we come up with a bound on this that is exponential in $a$ and $d$, the dimension of the vector? – paulinho Apr 16 '21 at 01:17

2 Answers2

8

The keywords you are looking for are "Gaussian chaos of order two" or "Hanson-Wright inequality", see for instance Example 2.12 in Concentration Inequalities: A Nonasymptotic Theory of Independence by Gábor Lugosi, Pascal Massart, and Stéphane Boucheron, or Theorem 6.3.2 in the High Dimensional Probability book by Vershynin (the author provides a free pdf version on his website, I believe).

If $\|\cdot\|_F$ and $\|\cdot\|_{op}$ are the Frobenius and Operator norm of matrices, the result says that if $X\sim N(0, \Sigma)$ $$ P( \|X\|^2 \ge trace[\Sigma] + 2 \sqrt{t} \|\Sigma\|_F + 2t\|\Sigma\|_{op} ) \le e^{-t} $$ or equivalently with $Z=\Sigma^{-1/2}X \sim N(0, I)$, $$ P( \|\Sigma^{1/2}Z\|^2 \ge trace[\Sigma] + 2 \sqrt{t} \|\Sigma\|_F + 2t\|\Sigma\|_{op} ) \le e^{-t}. $$ There is also a slightly tighter bound for the lower tail, that can be found in Lemma 1 of Laurent and Massart (2000).

jlewk
  • 2,257
  • Thanks! This is exactly it. As for the bounds you’ve reproduced, are those the Hanson-Wright bounds? – paulinho Apr 18 '21 at 14:41
  • Hanson-Wright inequality usually denotes the case of subgaussian entries. The inequality I wrote is the special case for Gaussian entries for which explicit constants are easily obtainable, see Example 1.12 or Lemma 1 in the two references above – jlewk Apr 18 '21 at 21:54
0

A bound that I was able to get that is useful to me is the following: $$\mathbb{P}\left\{\left\|\vec{X} - \vec{\mu}\right\|^2 \leq a^2 \right\} \geq \left( 1 - \sqrt{\frac{2 \Lambda}{\pi a^2}} \cdot e^{-a^2 / (2 \Lambda)} \right)^n$$ where $\vec{X} \sim \mathcal{N}\left(\vec{0}, \Sigma\right)$ and $\Lambda = \text{tr}(\Sigma)$, and $n$ is the dimension of the random vector.

Proof: First, we may assume that $\mu = \vec{0}$ and that $\Sigma$ is diagonal with positive entries $\lambda_1 > \lambda_2 > \cdots > \lambda_n$. Note that $\Lambda = \lambda_1 + \cdots + \lambda_n$. The idea is to bound the probability $$p_i = \mathbb{P}\left\{|X_i|^2 > x_i^2 \right\}, ~ x_i^2 = \frac{\lambda_i}{\Lambda}a^2$$ This can be done with the standard 1D normal bound, if we observe that $X_i \sim \mathcal{N}(0, \lambda_i)$: $$p_i \leq \frac{2\sqrt{\lambda_i}}{x_i\sqrt{2 \pi}} \cdot e^{-x_i^2 / (2 \lambda_i)} = \sqrt{\frac{{2 \Lambda}}{{\pi a^2}}} \cdot e^{-a^2 / (2 \Lambda)}$$ And thus $$\mathbb{P}\left\{|X_i|^2 \leq x_i^2 \right\} = 1 - p_i \geq 1 - \sqrt{\frac{2 \Lambda}{\pi a^2}} \cdot e^{-a^2 / (2 \Lambda)}$$ We can then see that if $|X_i|^2 \leq \lambda_i a^2 \big/ \Lambda$ for all $i = 1, 2, \cdots, n$, then $\left\|\vec{X}\right\|^2 \leq a^2$. And since the $X_i$ are mutually independent, we have $$\mathbb{P}\left\{\left\|\vec{X}\right\|^2 \leq a^2\right\} \geq \left( 1 - \sqrt{\frac{2 \Lambda}{\pi a^2}} \cdot e^{-a^2 / (2 \Lambda)} \right)^n$$ as desired. $\square$

Remark: If we choose $$a^2 = 2 \Lambda \cdot \omega(n) \ln (n) + \frac 12 \ln(\Lambda)$$ with $\omega(n) \to \infty$ as $n \to \infty$ arbitrarily slowly, then this probability will approach $1$ as $n \to \infty$.

I'd also be interested in hearing other's approaches, especially if they have a tighter bound, or one that can bound the tail probabilities in terms of other quantities of $\Sigma$.

paulinho
  • 6,730