Question Regarding Vershynin's Proof of Bernstein's Inequality

Question

I have been studying Vershynin's "High-dimensional Probability," and I have some confusion regarding the proof of Bernstein's inequality (Thm 2.8.2). It concerns the following step: (Perhaps note that $(X_i)_i$ is a finite sequence of independent, mean zero subexponential random variables)

By a property of subexponential random variables, we can bound their MGF the following way for $|\lambda| \leq \frac{c}{\max \|X_i\|_{\phi_1}}$:

$$\mathbb{E}[\exp (\lambda X_i)] \leq \exp (C \lambda^2 \|X_i\|_{\phi_1}^2).$$

Thus we get (from a previous step, note $S := \sum X_i$) $$ \mathbb{P}(S \geq t) \leq \exp (-\lambda t + C \lambda^2 \sigma^2) \quad (*)$$ where $\sigma^2 = \sum_{i=1}^N \|X_i\|_{\phi_1}^2$. Now we minimize this expression in $\lambda$ with respect to the constraint and get an optimal choice of

$$\lambda = \min \left(\frac{t}{2C\sigma^2}, \frac{c}{\max_i \|X_i\|_{\phi_1}}\right).$$

Thus, with this we obtain $$ \mathbb{P}(S \geq t) \leq \exp \left(-\min \left(\frac{t^2}{4C\sigma^2}, \frac{ct}{2 \max_i \|X_i\|_{\phi_1}}\right)\right).$$

Now, my issue is with this last step. I get how we get the first term in the minimum simply by plugging in the first possible value of $\lambda$ into (*), and I would think to get the second one I just have to plug in the second possible value, but I don't see how I then get the second value... am I missing something?

If you want to check the source material, the book is available online for free under

https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.pdf

Where does "$c$" appear (lowercase)? Is this a typo with $C=c$ intended? — FShrike, Jul 28 '23 at 10:29
@FShrike it's not a typo, $c$ and $C$ are two different positive absolute constants. — LSK21, Jul 28 '23 at 11:24
@FShrike If I am not mistaken, it should be the case that $C = \frac{1}{c^2}$, but I am not completely sure. This comes from the fact that the mentioned property of subexponential variables is $\mathbb{E}[\exp (\lambda X)] \leq \exp (K^2 \lambda^2 |X|^2)$ for $|\lambda| \leq \frac{1}{K}$ and holds for any value of $K$ that is within an absolute constant factor of $|X|$. — LSK21, Jul 28 '23 at 11:38

score 1 · Accepted Answer · answered Aug 01 '23 at 07:29

After talking this through with a friend, I think I have come up with an answer. Note that we can rewrite $$ -\lambda t + C \lambda^2 \sigma^2 = -\lambda(t - C \lambda \sigma^2)$$ and $$ t - C \lambda \sigma^2 = t - \min \left(\frac{t}{2}, \frac{cC\sigma^2}{\max \|X_i\|_{\phi_1}} \right) $$ when plugging in the optimal value for $\lambda$. Now, we can observe that the minimum above is less than or equal to $\frac{t}{2}$, and so $$ \exp (-\lambda t + C \lambda^2 \sigma^2) \leq \exp (-\lambda \frac{t}{2}). \quad (**)$$ So, using this one gets the inequality above by now simply plugging in $\lambda$ in $(**)$.

score 1 · Answer 2 · answered Aug 01 '23 at 07:43

Note that $$\frac{t}{2 C \sigma^2} > \frac{c}{\max_i \Vert X_i \Vert_{\psi_1}} \iff \sigma^2 < \frac{t \max_i \Vert X_i \Vert_{\psi_1}}{2 cC}.$$ So if the optimal $\lambda = c / \max_i \Vert X_i \Vert_{\psi_1}$, we can use the above inequality for $\sigma^2$ in your equation $(*)$; simplifying yields the second case of the claimed probability bound as stated.

Question Regarding Vershynin's Proof of Bernstein's Inequality

2 Answers2