10

Because the Hessian matrix is real and symmetric, we can decompose it into a set of real eigenvalues and an orthogonal basis of eigenvectors. The second derivative in a specific direction represented by a unit vector d is given by $d^T Hd$. When d is an eigenvector of H , the second derivative in that direction is given by the corresponding eigenvalue. "

I didn't understand why "The second derivative in a specific direction represented by a unit vector d is given by $d^T Hd$".

Abhishek Bhatia
  • 1,758
  • 7
  • 23
  • 38

3 Answers3

9

As I think you are asking for intuition regarding "The second derivative in a specific direction represented by a unit vector d is given by $d’Hd$”, let me correlate it in two ways with the normal way we think about derivatives. I’ll use two dimensions to illustrate in both cases. Let the unit vector $\bar{d}$ be $(n_1,n_2)$ in the standard basis and let $\bar{x}$ represent the point (x,y).

For the shorter explanation, consider the function value $f(\bar{x}+ds \bar{d})$ at a small distance $ds$ from $\bar{x}$ along $\bar{d}$ as a Taylor expansion. Let $h=n_1ds$ and $k=n_2ds$ denote the corresponding increments along the x and y directions.

$$f(\bar{x}+ds \bar{d})=f(x,y) + hf_x+kf_y + \frac{1}{2}(h^2f_{xx}+ 2hkf_{xy}+ k^2f_{yy}) + \mbox{h.o.t.}$$

$$=f(x,y) + ds(n_1f_x+n_2f_y) + \frac{1}{2}ds^2(n_1^2f_{xx}+ 2n_1n_2f_{xy}+ n_2^2f_{yy}) + \mbox{h.o.t.}$$

$$=f(x,y) + ds (\nabla f \cdot \bar{d} )+ \frac{1}{2}ds^2 (\bar{d}’H \bar{d} )+ \mbox{h.o.t.}$$

That is, $\nabla f \cdot \bar{d}$ plays the role of the first derivative and $\bar{d}’H \bar{d}$ plays the role of the second derivative along the direction $\bar{d}$.

The second explanation is using the same idea but depending on your bent of mind, might be more intuitive. Proceeding as in finite differences, where $f_x$ is approximated by $\frac{f(x+\Delta x)-f(x)}{\Delta x}$ with the approximation becoming exact as $\Delta x \rightarrow 0$. Then the second derivative $f_{xx}$ is likewise approximated by $$\frac{ f_x(x+\frac{\Delta x}{2}) - f_x(x -\frac{\Delta x}{2}) }{\Delta x}$$

$$~ \frac{ f( x + \Delta x) -2f(x) + f( x - \Delta x) }{\Delta x^2}$$ Now, apply that one dimensional second derivative idea along the direction $\bar{d}$ to see that, ignoring higher order terms for now, the second derivative is

$$ \frac{ f( x + h, y+ k) -2f(x) + f( x - h, y-k }{ h^2 + k^2}$$

Using 2 dimensional Taylor expansions for $f( x + h, y+ k)$ and $ f( x - h, y-k )$ (write it out)

and using $h=n_1ds$ and $k=n_2ds$, we see that the second derivative approximation is given by

$$ds^2 \frac{ n_1^2f_{xx}+ 2n_1n_2f_{xy}+ n_2^2f_{yy} }{ ds^2} = ds^2 \frac{ \bar{d}’H \bar{d} }{ ds^2} = \bar{d}’H \bar{d} $$ with the higher order terms vanishing as you take $ds$ to zero.

I would have liked to expand some of the steps more, but MathJax on a phone is rather painful. I hope one of these explanations felt intuitive to you. Please leave a comment if more clarification is needed.

Mathemagical
  • 3,642
3

I'll use the 2D case just to illustrate the concept. $d^T = \begin{pmatrix} d_1 & d_2 \\ \end{pmatrix}$ and $f_{ij}$ represents the double partial derivative wrt the variables $i$ and $j$.

$$ d^THd = \begin{pmatrix} d_1 & d_2 \\ \end{pmatrix} \begin{pmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \\ \end{pmatrix} \begin{pmatrix} d_1 \\ d_2\\ \end{pmatrix} $$

Observe that if $d^T = \begin{pmatrix} 1 & 0 \\ \end{pmatrix}$ one recovers $f_{xx}$ or $f_{yy}$ if $d^T = \begin{pmatrix} 0 & 1 \\ \end{pmatrix}$. If $d$ happens to be an eigenvector, its corresponding eigenvalue will be the derivative in that direction:

$$ d^THd = \begin{pmatrix} d_1 & d_2 \\ \end{pmatrix} \begin{pmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \\ \end{pmatrix} \begin{pmatrix} d_1 \\ d_2\\ \end{pmatrix} $$

Now remember that this is the diagonal representation of the matrix and the unit eigenvectors will be $d^T = \begin{pmatrix} 1 & 0 \\ \end{pmatrix}$ and $d^T = \begin{pmatrix} 0 & 1 \\ \end{pmatrix}$. In either case you get:

$$ d_i^THd_i = \lambda_i $$

  • Can you prove the last statement. And why is this statement True "If d happens to be an eigenvector, its corresponding eigenvalue will be the derivative in that direction:" – Abhishek Bhatia Jun 06 '17 at 07:39
  • Observe that if $\vec x^T = (x, y)$ is expressed in eigen base then $\frac{1}{2}\vec x^T H \vec x = \frac{1}{2}(\lambda_1 x^2+ \lambda_2 y^2)$ –  Jun 06 '17 at 13:05
  • I am afraid the last statement is not quite complete. $H$ will have to be eigendecomposed into $ULU^T$ first to have a diagonal matrix $L$ of eigenvalues. Then $U^Td$ will yield a unit vector that has the position being 1 that corresponds to the position of the eigen value in the diagonal, and the other positions being 0. – huajun Sep 19 '24 at 05:59
2

If we represent the matrix of eigenvectors of H as U, and the diagonal matrix of eigenvalues as L, we can rewrite it as $d^TULU^Td$. By orthogonality, if $d$ is an eigenvector, $U^Td$ generates the vector $e_j$, a zero vector with the j-th element equal to 1, and thus $e_j^tLe_j$ is equal to $L_{jj}$.