1

So I was reading over the post from the following: The "second derivative test" for $f(x,y)$

And I understood everything in Alex's response and followed through with everything until the last part. So at a critical point the higher dimensional Taylor Series simplifies to:

$$f(x,y) = f(x_{0}, y_{0}) + \frac{1}{2} \begin{bmatrix} x-x_{0} & y-y_{0} \end{bmatrix} \mathbf{H} (x_{0},y_{0})\begin{bmatrix} x-x_{0}\\y-y_{0} \end{bmatrix} + \cdots$$

Here I'm using the notion that fields and certain vector spaces are equivalent to view the resulting 1x1 matrix as "similar" to a scalar. It is stated in the post that if we change coordinates to some s and t variables that run in the directions of H's eigenspaces, with the origin at the critical point, we get:

$$f(s,t) = f(0,0) + \frac{1}{2} \lambda_{1} s^{2} + \frac{1}{2} \lambda_{2} t^{2} + \cdots$$

Intuitionally it makes sense how that is the answer, but I am not sure where to start mathematically to derive this. Would I start by using the change of coordinates matrix on the Hessian matrix?

Eliot
  • 128
  • 1
  • 10

1 Answers1

1

When we speak of "changing coordinates" here it means we are considering a certain (in this case affine) mapping $\phi:\Bbb{R}^2\to\Bbb{R}^2$ (which encodes the effect of "changing coordinates") such that the composed mapping $F=f\circ \phi$ has the Taylor expansion \begin{align} F(s,t)= F(0,0) + \frac{1}{2}(\lambda_1s^2+\lambda_2t^2)+\dots \tag{$*$} \end{align} for some $\lambda_1,\lambda_2\in\Bbb{R}$. In this case, what the mapping $\phi$ does is the following: suppose $\{u,v\}\subset \Bbb{R}^2$ are the two "eigenvectors" of the Hessian $D^2f_{(x_0,y_0)}$. Then, we define $\phi:\Bbb{R}^2\to\Bbb{R}^2$ as \begin{align} \phi(s,t):= (x_0,y_0)+su+tv. \end{align} If you really insist on matrix notation then it reads \begin{align} \begin{pmatrix} s\\ t \end{pmatrix} \mapsto \begin{pmatrix} x\\y \end{pmatrix}= \begin{pmatrix} x_0\\y_0 \end{pmatrix}+ \begin{pmatrix} u_1&v_1\\u_2&v_2 \end{pmatrix}\cdot \begin{pmatrix} s\\t \end{pmatrix} \end{align} So, the mapping $\phi$ tells you how to go from the $(s,t)$ coordinates to the $(x,y)$ coordinates, and hence when we consider the composed mapping $F=f\circ \phi$, we're looking at the function $f$ "expressed in $(s,t)$ coordinates". To visualize this geometrically and keep track of all this information, I suggest you draw two copies of the plane $\Bbb{R}^2$. On one of them label the axes as $(s,t)$, and on the other copy label the axes as $(x,y)$ and draw a dot to denote the point $(x_0,y_0)$.

Then what is going on is we're mapping the origin of the $(s,t)$ plane to the point $(x_0,y_0)$ on the $(x,y)$ plane, and we're mapping the vector $e_1=(1,0)$ onto $(x_0,y_0)+u$ of the Hessian (think of this as mapping $e_1$ onto the vector $u$ which is emanating from the point $(x_0,y_0)$), and similarly we map $e_2$ onto $(x_0,y_0)+v$. So this change of coordinates (as we'll see below) allows us to work at the origin and assume that the "eigenvectors" of the Hessian are along the coordinate axes of our $(s,t)$ plane.

Now, to prove that $(*)$ really is the Taylor expansion of $F$, we have to use the chain rule. For ease of typing, let $\sigma=(s,t)$. Then, by Taylor's theorem, \begin{align} F(\sigma)&= F(0,0)+ DF_{(0,0)}[\sigma] + \frac{1}{2!}D^2F_{(0,0)}[\sigma,\sigma] + O(|\sigma|^2)\tag{i} \end{align} The first term is \begin{align} DF_{(0,0)}[\sigma]&=Df_{\phi(0,0)}(D\phi_{(0,0)}[\sigma])\\ &=Df_{(x_0,y_0)}(D\phi_{(0,0)}[\sigma])\\ &= 0,\tag{ii} \end{align} since by assumption $(x_0,y_0)$ is a critical point of $f$, so $Df_{(x_0,x_0)}=0$. As you can see, we just repeat this again: if you carry out the chain rule directly then you'll get \begin{align} D^2F_{(0,0)}[\sigma,\sigma] &= D^2f_{\phi(0,0)}[D\phi_{(0,0)}[\sigma],D\phi_{(0,0)}[\sigma]] + Df_{\phi(0,0)}[D^2\phi_{(0,0)}[\sigma,\sigma]] \end{align} In this special case of course, we have a lot of simplifications: first of all as above, $Df_{\phi(0,0)}=Df_{(x_0,y_0)}=0$ by assumption. Next, since $\phi$ is an affine function, $D\phi_{(0,0)}[\sigma]= su+tv$. Plugging all of this in, we see that \begin{align} D^2F_{(0,0)}[\sigma,\sigma] &= D^2f_{(x_0,y_0)}[su+tv, su+tv]\\ &= s^2 D^2f_{(x_0,y_0)}[u,u]+ 2st D^2f_{(x_0,y_0)}[u,v]+ t^2D^2f_{(x_0,y_0)}[v,v]\tag{iii} \end{align} where I used the fact that the second derivative is symmetric in it's two arguments (i.e the fact that the Hessian matrix is symmetric). Now, when I said above that $u,v$ are "eigenvectors" of $D^2f_{(x_0,y_0)}$ what this means is that $D^2f_{(x_0,y_0)}[u,u]=\lambda_1,D^2f_{(x_0,y_0)}[v,v]=\lambda_2,D^2f_{(x_0,y_0)}[u,v]=0$. Or expressed more succinctly, the matrix representation of $D^2f_{(x_0,y_0)}$ relative to the "eigenbasis" $\{u,v\}$ is diagonal \begin{align} [D^2f_{(x_0,y_0)}]&= \begin{pmatrix} D^2f_{(x_0,y_0)}[u,u]&D^2f_{(x_0,y_0)}[u,v]\\ D^2f_{(x_0,y_0)}[v,u]&D^2f_{(x_0,y_0)}[v,v] \end{pmatrix}= \begin{pmatrix} \lambda_1&0\\ 0&\lambda_2 \end{pmatrix}\tag{iv} \end{align} So, if we put together (i),(ii),(iii),(iv) then we get exactly $(*)$: \begin{align} F(s,t)&= F(0,0)+\frac{1}{2}(\lambda_1 s^2 + \lambda_2 t^2) + \dots \end{align}


Extra FYI:

Of course, all of this generalizes verbatim to higher dimensions: the calculus part stays almost the same; the only thing you need to generalize is the linear algebra. You just need to prove that any symmetric bilinear form $\zeta:V\times V\to\Bbb{R}$ on a finite-dimensional real vector space $V$ can be "orthogonally diagonalized", in the sense that there exists a basis $\{v_1,\dots, v_n\}$ of $V$ such that $\zeta(v_i,v_j)=0$ if $i\neq j$. In other words, relative to this basis, the symmetric bilinear form $\zeta$ has matrix representation \begin{align} [\zeta]&= \begin{pmatrix} \lambda_1 & \cdots & 0\\ \vdots &\ddots & \vdots\\ 0& \cdots & \lambda_n \end{pmatrix} \end{align} where $\zeta(v_i,v_i)=\lambda_i$. The proof of this is essentially the same as the Grahm-Schmidt orthonormalization procedure for inner products if you've seen that.

Proof:

We proceed inductively: if $\dim V=1$, then any non-zero vector is a basis, and we're done. Otherwise, suppose we can do so for $n$-dimensional vector spaces. Then, supposing $\dim V=n+1$, we distinguish two cases: if $\zeta=0$ identically, then any basis will work. Otherwise, choose a $v\in V$ such that $\zeta(v,v)\neq 0$. Then, $\ker \zeta(\cdot, v)$ will be an $n$-dimensional vector space and the restriction of $\zeta$ to this space will (by induction hypothesis) have a "orthogonal basis" $\{v_1,\dots, v_n\}$. It is then easy to see that, $\{v_1,\dots, v_n,v\}$ is an "orthogonal" basis for $V$. This completes the induction.

Once you do this, we can play the same game: given $f:\Bbb{R}^n\to\Bbb{R}$ and a point $x_0\in\Bbb{R}^n$, and letting $\{v_1,\dots, v_n\}$ be an "orthogonal" basis for the second derivative $D^2f_{x_0}:\Bbb{R}^n\times\Bbb{R}^n\to\Bbb{R}$, we simply define $\phi:\Bbb{R}^n\to\Bbb{R}^n$ by \begin{align} \phi(s)&=x_0+\sum_{i=1}^ns_iv_i \end{align} (this tells us how to go from $s$-coordinates to $x$-coordinates) Then, we consider the mapping $F=f\circ \phi$ exactly as above (i.e the function $f$ expressed in terms of $s$-coordinates).

peek-a-boo
  • 65,833