6

I have been studying Vershynin's "High-dimensional Probability," and I have some confusion regarding the proof of Bernstein's inequality (Thm 2.8.2). It concerns the following step: (Perhaps note that $(X_i)_i$ is a finite sequence of independent, mean zero subexponential random variables)

By a property of subexponential random variables, we can bound their MGF the following way for $|\lambda| \leq \frac{c}{\max \|X_i\|_{\phi_1}}$:

$$\mathbb{E}[\exp (\lambda X_i)] \leq \exp (C \lambda^2 \|X_i\|_{\phi_1}^2).$$

Thus we get (from a previous step, note $S := \sum X_i$) $$ \mathbb{P}(S \geq t) \leq \exp (-\lambda t + C \lambda^2 \sigma^2) \quad (*)$$ where $\sigma^2 = \sum_{i=1}^N \|X_i\|_{\phi_1}^2$. Now we minimize this expression in $\lambda$ with respect to the constraint and get an optimal choice of

$$\lambda = \min \left(\frac{t}{2C\sigma^2}, \frac{c}{\max_i \|X_i\|_{\phi_1}}\right).$$

Thus, with this we obtain $$ \mathbb{P}(S \geq t) \leq \exp \left(-\min \left(\frac{t^2}{4C\sigma^2}, \frac{ct}{2 \max_i \|X_i\|_{\phi_1}}\right)\right).$$

Now, my issue is with this last step. I get how we get the first term in the minimum simply by plugging in the first possible value of $\lambda$ into (*), and I would think to get the second one I just have to plug in the second possible value, but I don't see how I then get the second value... am I missing something?


If you want to check the source material, the book is available online for free under

https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.pdf

LSK21
  • 1,244
  • Where does "$c$" appear (lowercase)? Is this a typo with $C=c$ intended? – FShrike Jul 28 '23 at 10:29
  • @FShrike it's not a typo, $c$ and $C$ are two different positive absolute constants. – LSK21 Jul 28 '23 at 11:24
  • @FShrike If I am not mistaken, it should be the case that $C = \frac{1}{c^2}$, but I am not completely sure. This comes from the fact that the mentioned property of subexponential variables is $\mathbb{E}[\exp (\lambda X)] \leq \exp (K^2 \lambda^2 |X|^2)$ for $|\lambda| \leq \frac{1}{K}$ and holds for any value of $K$ that is within an absolute constant factor of $|X|$. – LSK21 Jul 28 '23 at 11:38

2 Answers2

1

After talking this through with a friend, I think I have come up with an answer. Note that we can rewrite $$ -\lambda t + C \lambda^2 \sigma^2 = -\lambda(t - C \lambda \sigma^2)$$ and $$ t - C \lambda \sigma^2 = t - \min \left(\frac{t}{2}, \frac{cC\sigma^2}{\max \|X_i\|_{\phi_1}} \right) $$ when plugging in the optimal value for $\lambda$. Now, we can observe that the minimum above is less than or equal to $\frac{t}{2}$, and so $$ \exp (-\lambda t + C \lambda^2 \sigma^2) \leq \exp (-\lambda \frac{t}{2}). \quad (**)$$ So, using this one gets the inequality above by now simply plugging in $\lambda$ in $(**)$.

LSK21
  • 1,244
1

Note that $$\frac{t}{2 C \sigma^2} > \frac{c}{\max_i \Vert X_i \Vert_{\psi_1}} \iff \sigma^2 < \frac{t \max_i \Vert X_i \Vert_{\psi_1}}{2 cC}.$$ So if the optimal $\lambda = c / \max_i \Vert X_i \Vert_{\psi_1}$, we can use the above inequality for $\sigma^2$ in your equation $(*)$; simplifying yields the second case of the claimed probability bound as stated.

JKL
  • 2,189