3

I would like to have an intuition behind the fact that a positive definite Hessian is equivalent to the fact that the function is convex.

In fact, given a function $f : \mathbb{R}^2 \to \mathbb{R}$, if we put $\boldsymbol{u} = \begin{bmatrix}x ,&y\end{bmatrix}$, $\mathbf{H}_f= \begin{bmatrix} f_{xx} &f_{xy}\\f_{yx} &f_{yy}\end{bmatrix}$ by developing $\boldsymbol{u}^\intercal \mathbf{H}_f \boldsymbol{u}$ we obtain a quadratic form.

I don't understand how this quadratic form gives the value of the second derivative of $f$ in the direction of $\boldsymbol{u}$.

Thank you for your attention!

Gesser
  • 31
  • 3
  • 2
    https://math.stackexchange.com/questions/2573376/second-directional-derivative-and-hessian-matrix – angryavian Jan 07 '23 at 22:15
  • But what have you thought about the problem? You can directly compute the value of $u^THu$ in closed form – FShrike Jan 07 '23 at 22:16
  • I looked at this link, it explains how to find this formula using the mathematical definition. I am looking for an intuitive idea that would explain why this quadratic form gives the value of the second derivative of in the direction of – Gesser Jan 07 '23 at 23:37
  • I don't think I understand your question @FShrike – Gesser Jan 07 '23 at 23:37

1 Answers1

2

I don't know that there's an easier way to see it than by direct computation:

\begin{align*} \lim_{t\to 0} \frac{d^2}{dt^2} f(\mathbf{x}+t\mathbf{u}) &= \lim_{t\to 0} \frac{d}{dt}\mathbf{u}^T\nabla f(\mathbf{x} + t\mathbf{u})\\ & = \lim_{t\to 0} \mathbf{u}^T Hf(\mathbf{x}+t\mathbf{u})\mathbf{u}\\ & = \mathbf{u}^T Hf(\mathbf{x})\mathbf{u}. \end{align*}

What's really going on here is that the Hessian is the second-order part of the Taylor expansion of a multivariable function; it plays the multivariable role of the second derivative.

$$f(\mathbf{x}+t\mathbf{u}) = f(\mathbf{x}) + t \mathbf{u}^T\nabla f(\mathbf{x}) + \frac{t^2}{2} \mathbf{u}^T Hf(\mathbf{x}) \mathbf{u} + \frac{t^3}{3!} \sum_{ijk}\frac{\partial^3 f}{\partial x_i\partial x_j\partial x_k}(\mathbf{x})u_iu_ju_k + \cdots$$

though I was flabbergasted a few years ago to learn that Taylor's theorem isn't even mentioned once in the entire 500-page multivariable calculus textbook popular at my university, so it's perhaps no surprise if you haven't seen it...

user7530
  • 50,625
  • Thank you for your answer, indeed I had seen this approach. But I can't make sense of this expression: (). It's more in what it actually means rather than where it comes from that I look. – Gesser Jan 08 '23 at 12:33
  • @Gesser Oh, by this I mean the matrix $Hf$ multiplied from both sides by the vector $\mathbf{u}$. – user7530 Jan 08 '23 at 16:37
  • Yes, sorry I misspoke. I know what this expression corresponds to, but it's more in the sense that when we expand it it gives us a quadratic form and I don't see how this quadratic form corresponds to the value of the second derivative of in the direction of – Gesser Jan 09 '23 at 16:59
  • @Gesser I'd say the way to understand this is via Taylor's theorem. The matrix of second derivatives gives you the "best possible" quadratic approximation to $f$, in the same way as the gradient (the matrix of first derivatives) gives you the best possible linear approximation to $f$. It should also not be a surprise that the second partial derivatives of $f$ appear in the expression for $f$'s best second-order approximation. – user7530 Jan 09 '23 at 18:05
  • @Gesser As for why it's the Hessian (rather than some other matrix whose entries involve second derivatives of $f$)... this can be proven using the approaches above. To build intuition it might help to start by looking at what happens when $f$ is already exactly a quadratic. – user7530 Jan 09 '23 at 18:07
  • ok thanks I'll think about it ! – Gesser Jan 09 '23 at 21:10