11

I am stuck in the first part of problem 2 of the chapter 8 (error estimation) of the book "A probabilistic theory of pattern recognition" by Devroye:

Show that for any $s>0$, and any random variable $X$ with $\mathbf EX=0,\mathbf EX^2=\sigma^2, X\le c$, $$\mathbf E\left\{e^{sX}\right\}\le e^{f(\sigma^2/c^2)}\,,$$ where $$f(u)=\log\left(\frac1{1+u}e^{-csu}+\frac{u}{1+u}e^{cs}\right).$$

The purpose of the problem is to prove Bennett's inequality. I've searched for how Bennett's is usually proved, and it seems like the usual trick is to expand $\mathbb{E}[e^{sX}]$ with the Taylor series, followed by applying an inequality on the terms in $\mathbb{E}[X^k], k \geq 3$. However, this is not what the author has in mind here, and I cannot figure out any way to invoke the term $e^{-csu}$ in any inequality.

StubbornAtom
  • 17,932
  • 3
    I'll leave a couple of partial observations here: (1) It looks like this should be an application of Jensen's inequality to the exponential function since the right hand side of the inequality is a convex combination of exponentials. This works if you additionally know that $X \ge - \sigma^2/c$ by writing $sX = tcs + (1-t)(- \sigma^2 s /c)$ for $t = \frac{cX + \sigma^2}{\sigma^2 + c^2}$. (2) Following this train of thought, one might hope to split $e^{sX}$ into parts depending on the sign of $X + \sigma^2/c$. – Rhys Steele Mar 14 '23 at 13:33
  • 3
    When I tried this, I got $\frac{c^2}{\sigma^2 + c^2} e^{-\sigma^2 s/ c}$ as a bound for $\mathbb{E}[e^{sX} 1_{X \le - \sigma^2/c}]$ by applying the trivial upper bound for $e^{sX}$ in this region and then applying the one-sided Chebyshev inequality. This is promising since this is the first term on the right hand side. However, the part where $X \ge - \sigma^2/c$ doesn't seem to have the right type of bound anymore. – Rhys Steele Mar 14 '23 at 13:34
  • 3
    The Jensen argument seems to fail since $X 1_{X \ge - \sigma^2/c}$ is no longer centred and inserting the trivial bound $X \le c$ on this regime would require one to show that $P(X \ge -\sigma^2/c) \le \frac{\sigma^2}{\sigma^2 + c^2 }$ to obtain the solution. However by one-sided Chebyshev that inequality would be true if we put a $\ge$ sign in place of $\le$. – Rhys Steele Mar 14 '23 at 13:34
  • @RhysSteele So do you imply the inequality could be wrong? – Zhanxiong Mar 14 '23 at 21:49
  • @Zhanxiong I don't have a counterexample in mind so I wouldn't claim that based on my reasoning so far. Notice that the upper bounds in (2) could be loose so the last sentence in that point does not necessarily suggest that the inequality is wrong. – Rhys Steele Mar 14 '23 at 22:37

1 Answers1

8

The idea is to dominate $e^{sx}$ with a quadratic function. To achieve this, we exploit the property that all higher derivative of $e^{sx}$ are positive. [A search shows that this property is called completely monotonic.]

Lemma: Let $\Phi(x) = e^{sx}$ and $\Psi(x) = a_2 x^2 + a_1 x + a_0$ be a quadratic function. If we fit $\Psi$ for some $a < b$ such that $\Phi(a) = \Psi(a)$, $\Phi(b) = \Psi(b)$, and $\Phi'(a) = \Psi'(a)$, then $\Phi(x) \leq \Psi(x)$ on $[-\infty, b]$.

Proof: We consider the Taylor expansion of $\Phi$ and $\Psi$ at $a$. We have $$\Phi(x) = \Phi(a) + \Phi'(a)\cdot (x - a) + \frac{1}{2}\Phi''(a)\cdot (x - a)^2 + \cdots$$ while since $\Psi$ is quadratic, we have $$\Psi(x) = \Psi(a) + \Psi'(a)\cdot (x - a) + \frac{1}{2}\Psi''(a)\cdot (x - a)^2.$$ In particular, taking $x = b$ gives $$\frac{1}{2}\Psi''(a)\cdot (b - a)^2 = \frac{1}{2}\Phi''(a)\cdot (b - a)^2 + \sum_{n \geq 3} \frac{1}{n!}\Phi^{(n)}(a)\cdot (b - a)^n$$ as the higher derivative are positive, we have $\Psi''(a) > \Phi''(a)$.

This immediately implies the inequality for $x < a$, since in this case $$\Phi(x) = \Phi(a) + \Phi'(a)\cdot (x - a) + \frac{1}{2}\Phi''(a)\cdot (x - a)^2 + \frac{1}{6}\Phi'''(\xi)\cdot (x - a)^3$$ and the third term is negative, so $$\Phi(x) \leq \Phi(a) + \Phi'(a)\cdot (x - a) + \frac{1}{2}\Phi''(a)\cdot (x - a)^2$$ which is at most $$\Psi(x) = \Psi(a) + \Psi'(a)\cdot (x - a) + \frac{1}{2}\Psi''(a)\cdot (x - a)^2.$$ For $x \in [a, b]$, we have $$\frac{1}{2}\Psi''(a)\cdot (b - a)^2 = \frac{1}{2}\Phi''(a)\cdot (b - a)^2 + \sum_{n \geq 3} \frac{1}{n!}\Phi^{(n)}(a)\cdot (b - a)^n$$ thus $$\frac{1}{2}\Psi''(a)= \frac{1}{2}\Phi''(a) + \sum_{n \geq 3} \frac{1}{n!}\Phi^{(n)}(a)\cdot (b - a)^{n - 2}.$$ We now note that replacing $b$ with $x$ decreases RHS since every term is positive, so $$\frac{1}{2}\Psi''(a) \geq \frac{1}{2}\Phi''(a) + \sum_{n \geq 3} \frac{1}{n!}\Phi^{(n)}(a)\cdot (x - a)^{n - 2}$$ which translates back to $\Psi(x) \geq \Phi(x)$.

Thus we have established the lemma. The rest is straightforward. Write $u = \sigma^2 / c^2$. Take $a = -cu$ and $b = c$. The quadratic $\Psi$ exists since it is determined by three linear equations, thus $$\mathbb{E} \Phi(X) \leq \mathbb{E} \Psi(X) = a_2 \sigma^2 + a_0.$$ We have $$a_2 c^2 u^2 - a_1 cu + a_0 = \Phi(-cu),$$ $$a_2 c^2 + a_1 c + a_0 = \Phi(c),$$ so taking a linear combination $$a_2 \sigma^2 + a_0 = a_2 c^2u + a_0 = \frac{u}{u + 1} \Phi(c) + \frac{1}{u + 1} \Phi(-cu).$$ Thus we conclude the desired result $$\mathbb{E} \Phi(X) \leq \frac{u}{u + 1} \Phi(c) + \frac{1}{u + 1} \Phi(-cu)$$ and the equality is achieved when $X$ is supported on $\{c, -cu\}$.

abacaba
  • 11,210
  • I took $u = \sigma^2 / c^2$. – abacaba Mar 15 '23 at 00:57
  • Thank you for the clarification. That's a great answer! upvoted! – Zhanxiong Mar 15 '23 at 01:01
  • It is a nice idea. (+1) – River Li Mar 15 '23 at 01:16
  • It is well known that the exponential function majorizes a linear function, while it is much less known that it can be majorized by a quadratic function (on some subset of $\mathbb{R}$) and has such a nice application. Eye opening! – Zhanxiong Mar 15 '23 at 01:26
  • I was following the stat question @Zhanxiong. This is ingenious (+1) and above all, I learnt a new thing: majorization. Now I am going to spend some time reading the initial chapters of Ingram Ollie's treatise on the same. – User1865345 Mar 15 '23 at 04:45
  • TBH I am not using majorization in the standard sense; I mean "dominate". – abacaba Mar 15 '23 at 04:46
  • Nevertheless, it was a noble approach @abacaba. – User1865345 Mar 15 '23 at 07:06
  • Btw @abacaba, how did you approach this problem? Meant to say how did you come up with the idea of dominating the exponential function with the quadratic function? Was it a result of trial and error or there is something more implicit in the problem? I would love to know. – User1865345 Mar 15 '23 at 07:24
  • This was Bennett's idea originally and can be found in his paper as well. – Sarvesh Ravichandran Iyer Mar 15 '23 at 13:07