Why do we need to determine the definiteness of the Hessian to decide what a critical point is?

Question

In univariate calculus, if we know that $f'(c)=0$, we can determine if the function $f$ has a minimum at $c$ by checking that $f''(c) > 0$. The multivariate analogue of the second derivative is the Hessian matrix. I now learned that to decide between extreme and saddle points in this case, it has to be checked whether the Hessian is positive definite, negative definite or indefinite. This can be achieved by checking its eigenvalues.

I have several questions regarding this:

Why is it not sufficient to check the sign of the values in the Hessian, but we need to check for definiteness?
Does the definiteness just make sure some convexity or concavity properties check out, or is there a more meaningful interpretation of that?
How do the eigenvalues of a matrix tell us its definiteness?
Addendum: What do the off-diagonal entries in the Hessian even mean? How the slope in a certain dimension changes by making changes in a different dimension?

score 4 · Accepted Answer · answered Apr 29 '20 at 20:57

The proof of the second derivative test at a critical point ($Df_a = 0$) runs as follows: for a given sufficiently smooth map $f: \Bbb{R}^n \to \Bbb{R}$, and a point $a \in \Bbb{R}^n$, we write a second order Taylor expansion at the point $a$: \begin{align} f(a+h) - f(a) &= \dfrac{1}{2}(D^2f_a)(h,h) + o(\lVert h\rVert^2). \end{align} In other words, there is a "remainder term", which is a function $\rho$, such that $\lim_{h \to 0} \rho(h) = 0$, and \begin{align} f(a+h) - f(a) &= \dfrac{1}{2}(D^2f_a)(h,h) + \rho(h) \lVert h\rVert^2. \end{align} If the Hessian $D^2f_a$ is positive definite say, then there is a positive constant $\lambda$ such that for all $h \in \Bbb{R}^n$, $D^2f_a(h,h) \geq \lambda \lVert h\rVert^2$ (with equality if and only if $h=0$). Hence, \begin{align} f(a+h) - f(a) &\geq \dfrac{\lambda}{2} \lVert h\rVert^2 + \rho(h) \lVert h\rVert^2 \\ &= \left( \dfrac{\lambda}{2} + \rho(h)\right) \lVert h\rVert^2. \end{align} Since $\rho(h) \to 0$ as $h \to 0$ and $\lambda > 0$, the term in brackets will be strictly positive if $h$ is sufficiently small in norm. Hence, for all $h$ sufficiently small in norm, $f(a+h) - f(a) \geq 0$ (with equality if and only if $h =0$). This is the proof for why a positive-definite Hessian implies you have a strict local minimum at a critical point $a$.

Of course, a similar proof holds for a negative-definite Hessian implying a strict local maximum.

Roughly speaking, the idea of the proof is that the local behaviour of $f(a+h) - f(a)$ is entirely determined by the behaviour of the Hessian, in the term $D^2f_a(h,h)$ (because the error term is "small"). So, to answer your questions,

The proof of the theorem above shows that we need to ensure that the entire term $D^2f_a(h,h)$ is positive (in fact bounded below by a positive multiple of $\lVert h \rVert^2$), so that we can conclude that $f(a+h) - f(a) \geq 0$. But just because an $n \times n$ matrix has all positive entires, it doesn't mean it is positive-definite (Robert's answer gives an explicit counter example).
Hopefully the proof I gave above justifies why definiteness comes into play (it's to ensure you have a good lower/upper bound on the $D^2f_a(h,h)$ term).
A matrix is positive(negative) definite if and only if all its eigenvalues are strictly positive (strictly negative). If there are some positive and some negative, then the matrix is indefinite. If this is the case for your Hessian, it means you have a saddle point (because the function is increasing along some directions while decreasing along others).

Great answer! Can you explain what the notation $D^2 f_a(h,h)$ means? Its the Hessian evaluated at expansion point a, and the $(h,h)$? And I assume its possible to show that the whole remainder of the Taylor expansion beyond the second derivative is smaller than $o(\lVert h\rVert^2)$? — Chris, May 19 '20 at 10:29
@Chris If $f:V \to W$ is a map between two vector spaces, and $a \in V$ is a point then the first derivative $Df_a: V \to W$ is a linear map. The second derivative is a bilinear map $D^2f_a: V \times V \to W$. The third is a trilinear map $D^3f_a: V \times V \times V \to W$, and in general the $k^{th}$ derivative of a function is a k-multilinear map $D^kf_a: V^k \to W$. In linear algebra you learn that if $V = \Bbb{R}^n$ and $W = \Bbb{R}$ then $Df_a$ can be represented as a "row vector", and that $D^2f_a$ can be represented as a $n \times n$ matrix (the Hessian matrix). — peek-a-boo, May 19 '20 at 16:29
so, in this context, $D^2f_a$ is a bilinear map $\Bbb{R}^n \times \Bbb{R}^n \to \Bbb{R}$, and $D^2f_a(\xi,\eta) \in \Bbb{R}$ denotes the value of this bilinear map on the element $(\xi,\eta) \in \Bbb{R}^n \times \Bbb{R}^n$ of its domain. So, in this case, $D^2f_a(h,h)$ simply represents the value of the bilinear map $D^2f_a$ on the element $(h,h)$ in its domain. If you insist on thinking of matrices (which I don't like, because it requires you to choose a basis) then you can write it as $h^T \cdot H_f(a) \cdot h$ (multiplying the Hessian matrix by row and column vector representation of $h$) — peek-a-boo, May 19 '20 at 16:33
Finally, yes, the fact that the remainder terms beyond the second order are $o(\lVert h \rVert^2)$ is an application of Taylor's theorem. Take a look at https://math.stackexchange.com/questions/3271948/proving-limit-of-fx-tnfx-taylor-is-zero-in-multivariable-calculus/3272515#3272515 for a statement of the theorem in the general case (along with an outline of the proof). — peek-a-boo, May 19 '20 at 16:36

Robert Israel · Answer 2 · 2020-04-29T20:41:21.067

1) For example, the function $f(x,y) = x^2 + 4 x y + y^2$ has all entries of the Hessian matrix $> 0$, but the critical point $(0,0)$ is a saddle (e.g. $f(t,-t) < 0$ for $t \ne 0$).

2) A smooth function of $n$ variables is convex in an open set $R$ iff its Hessian is positive semidefinite there.

3) A real symmetric matrix is positive definite, positive semidefinite, negative semidefinite, or negative definite iff its eigenvalues are all $> 0$, $\ge 0$, $\le 0$, $< 0$ respectively.

Alex R. · Answer 3 · 2020-04-29T20:58:29.260

2

When you Taylor expand a multivariable function, it looks like:

$$f(x+h)=x+(Df)\cdot h+h^T(D^2f)h+o(h^Th)...,$$

so locally around a critical point, it looks like $x+h^T(D^2f)h.$

It's clear now that if $(D^2f)$ is positive definite, then locally, $f(x)$ increases in any direction away from $x$. The opposite occurs when it's negative definite. Otherwise, when it's indefinite or semidefinite, you get either a saddle or need to look at higher order derivatives to conclude.

edited Apr 29 '20 at 20:58

answered Apr 29 '20 at 20:56

Alex R.

33,289

A saddle is indefinite, not semi-definite. – Apr 29 '20 at 20:57
No, you mistyped. When it's indefinite, it's a saddle. – Ted Shifrin Apr 29 '20 at 20:57
Thanks, typo corrected. – Alex R. Apr 29 '20 at 20:58

score 1 · Answer 4 · 2020-04-29T21:00:04.847

You can think of a multivariate function around a critical point as a quadratic form (as long as the higher order Taylor terms are negligible). So the whole discussion amounts to analyzing the behavior of a (hyper-)quadric, defined by the Hessian matrix.

The study is made easy by diagonalizing this matrix, so that by a change of coordinates,

$$\frac1{2!}p^THp$$ reduces to (dropping the constant factor) $$q^T\Lambda q$$

or

$$\lambda_1u^2+\lambda_2v^2+\cdots\lambda_dw^2.$$

For a critical point to be a maximum or a minimum, all terms must have the same sign, hence some definiteness. The signs of the individual Hessian elements does not allow you to conclude about definiteness.
Indeed, just concavity or convexity.
Should be obvious from the diagonalized form.
Nothing by themselves, but they indirectly contribute to the Eigenvalues/the definiteness.

@Rodrigo de Azevedo: I don't use the notation with repeated operator. — , Apr 29 '20 at 20:55

Why do we need to determine the definiteness of the Hessian to decide what a critical point is?

4 Answers4

Linked