When we speak of "changing coordinates" here it means we are considering a certain (in this case affine) mapping $\phi:\Bbb{R}^2\to\Bbb{R}^2$ (which encodes the effect of "changing coordinates") such that the composed mapping $F=f\circ \phi$ has the Taylor expansion
\begin{align}
F(s,t)= F(0,0) + \frac{1}{2}(\lambda_1s^2+\lambda_2t^2)+\dots \tag{$*$}
\end{align}
for some $\lambda_1,\lambda_2\in\Bbb{R}$. In this case, what the mapping $\phi$ does is the following: suppose $\{u,v\}\subset \Bbb{R}^2$ are the two "eigenvectors" of the Hessian $D^2f_{(x_0,y_0)}$. Then, we define $\phi:\Bbb{R}^2\to\Bbb{R}^2$ as
\begin{align}
\phi(s,t):= (x_0,y_0)+su+tv.
\end{align}
If you really insist on matrix notation then it reads
\begin{align}
\begin{pmatrix}
s\\ t
\end{pmatrix}
\mapsto
\begin{pmatrix}
x\\y
\end{pmatrix}=
\begin{pmatrix}
x_0\\y_0
\end{pmatrix}+
\begin{pmatrix}
u_1&v_1\\u_2&v_2
\end{pmatrix}\cdot
\begin{pmatrix}
s\\t
\end{pmatrix}
\end{align}
So, the mapping $\phi$ tells you how to go from the $(s,t)$ coordinates to the $(x,y)$ coordinates, and hence when we consider the composed mapping $F=f\circ \phi$, we're looking at the function $f$ "expressed in $(s,t)$ coordinates". To visualize this geometrically and keep track of all this information, I suggest you draw two copies of the plane $\Bbb{R}^2$. On one of them label the axes as $(s,t)$, and on the other copy label the axes as $(x,y)$ and draw a dot to denote the point $(x_0,y_0)$.
Then what is going on is we're mapping the origin of the $(s,t)$ plane to the point $(x_0,y_0)$ on the $(x,y)$ plane, and we're mapping the vector $e_1=(1,0)$ onto $(x_0,y_0)+u$ of the Hessian (think of this as mapping $e_1$ onto the vector $u$ which is emanating from the point $(x_0,y_0)$), and similarly we map $e_2$ onto $(x_0,y_0)+v$. So this change of coordinates (as we'll see below) allows us to work at the origin and assume that the "eigenvectors" of the Hessian are along the coordinate axes of our $(s,t)$ plane.
Now, to prove that $(*)$ really is the Taylor expansion of $F$, we have to use the chain rule. For ease of typing, let $\sigma=(s,t)$. Then, by Taylor's theorem,
\begin{align}
F(\sigma)&= F(0,0)+ DF_{(0,0)}[\sigma] + \frac{1}{2!}D^2F_{(0,0)}[\sigma,\sigma] + O(|\sigma|^2)\tag{i}
\end{align}
The first term is
\begin{align}
DF_{(0,0)}[\sigma]&=Df_{\phi(0,0)}(D\phi_{(0,0)}[\sigma])\\
&=Df_{(x_0,y_0)}(D\phi_{(0,0)}[\sigma])\\
&= 0,\tag{ii}
\end{align}
since by assumption $(x_0,y_0)$ is a critical point of $f$, so $Df_{(x_0,x_0)}=0$. As you can see, we just repeat this again: if you carry out the chain rule directly then you'll get
\begin{align}
D^2F_{(0,0)}[\sigma,\sigma] &= D^2f_{\phi(0,0)}[D\phi_{(0,0)}[\sigma],D\phi_{(0,0)}[\sigma]] +
Df_{\phi(0,0)}[D^2\phi_{(0,0)}[\sigma,\sigma]]
\end{align}
In this special case of course, we have a lot of simplifications: first of all as above, $Df_{\phi(0,0)}=Df_{(x_0,y_0)}=0$ by assumption. Next, since $\phi$ is an affine function,
$D\phi_{(0,0)}[\sigma]= su+tv$. Plugging all of this in, we see that
\begin{align}
D^2F_{(0,0)}[\sigma,\sigma] &= D^2f_{(x_0,y_0)}[su+tv, su+tv]\\
&= s^2 D^2f_{(x_0,y_0)}[u,u]+ 2st D^2f_{(x_0,y_0)}[u,v]+ t^2D^2f_{(x_0,y_0)}[v,v]\tag{iii}
\end{align}
where I used the fact that the second derivative is symmetric in it's two arguments (i.e the fact that the Hessian matrix is symmetric). Now, when I said above that $u,v$ are "eigenvectors" of $D^2f_{(x_0,y_0)}$ what this means is that $D^2f_{(x_0,y_0)}[u,u]=\lambda_1,D^2f_{(x_0,y_0)}[v,v]=\lambda_2,D^2f_{(x_0,y_0)}[u,v]=0$. Or expressed more succinctly, the matrix representation of $D^2f_{(x_0,y_0)}$ relative to the "eigenbasis" $\{u,v\}$ is diagonal
\begin{align}
[D^2f_{(x_0,y_0)}]&=
\begin{pmatrix}
D^2f_{(x_0,y_0)}[u,u]&D^2f_{(x_0,y_0)}[u,v]\\
D^2f_{(x_0,y_0)}[v,u]&D^2f_{(x_0,y_0)}[v,v]
\end{pmatrix}=
\begin{pmatrix}
\lambda_1&0\\
0&\lambda_2
\end{pmatrix}\tag{iv}
\end{align}
So, if we put together (i),(ii),(iii),(iv) then we get exactly $(*)$:
\begin{align}
F(s,t)&= F(0,0)+\frac{1}{2}(\lambda_1 s^2 + \lambda_2 t^2) + \dots
\end{align}
Extra FYI:
Of course, all of this generalizes verbatim to higher dimensions: the calculus part stays almost the same; the only thing you need to generalize is the linear algebra. You just need to prove that any symmetric bilinear form $\zeta:V\times V\to\Bbb{R}$ on a finite-dimensional real vector space $V$ can be "orthogonally diagonalized", in the sense that there exists a basis $\{v_1,\dots, v_n\}$ of $V$ such that $\zeta(v_i,v_j)=0$ if $i\neq j$. In other words, relative to this basis, the symmetric bilinear form $\zeta$ has matrix representation
\begin{align}
[\zeta]&=
\begin{pmatrix}
\lambda_1 & \cdots & 0\\
\vdots &\ddots & \vdots\\
0& \cdots & \lambda_n
\end{pmatrix}
\end{align}
where $\zeta(v_i,v_i)=\lambda_i$. The proof of this is essentially the same as the Grahm-Schmidt orthonormalization procedure for inner products if you've seen that.
Proof:
We proceed inductively: if $\dim V=1$, then any non-zero vector is a basis, and we're done. Otherwise, suppose we can do so for $n$-dimensional vector spaces. Then, supposing $\dim V=n+1$, we distinguish two cases: if $\zeta=0$ identically, then any basis will work. Otherwise, choose a $v\in V$ such that $\zeta(v,v)\neq 0$. Then, $\ker \zeta(\cdot, v)$ will be an $n$-dimensional vector space and the restriction of $\zeta$ to this space will (by induction hypothesis) have a "orthogonal basis" $\{v_1,\dots, v_n\}$. It is then easy to see that, $\{v_1,\dots, v_n,v\}$ is an "orthogonal" basis for $V$. This completes the induction.
Once you do this, we can play the same game: given $f:\Bbb{R}^n\to\Bbb{R}$ and a point $x_0\in\Bbb{R}^n$, and letting $\{v_1,\dots, v_n\}$ be an "orthogonal" basis for the second derivative $D^2f_{x_0}:\Bbb{R}^n\times\Bbb{R}^n\to\Bbb{R}$, we simply define $\phi:\Bbb{R}^n\to\Bbb{R}^n$ by
\begin{align}
\phi(s)&=x_0+\sum_{i=1}^ns_iv_i
\end{align}
(this tells us how to go from $s$-coordinates to $x$-coordinates)
Then, we consider the mapping $F=f\circ \phi$ exactly as above (i.e the function $f$ expressed in terms of $s$-coordinates).