How can the hessian be the second derivative when the first derivative is a matrix depending on variables?

Question

For some function $f\colon \mathbb{R}^n \to \mathbb{R}$ the derivative would be some map map $\mathrm{D}f\colon \mathbb{R}^n \to L(\mathbb{R}^n, \mathbb{R})$ where $L(A, B)$ denotes the space of all linear mapping from $A$ to $B$. We can represent this derivative using a $1\times n $ matrix. But what does it now mean for $f$ to be twice differentiable? If constraint to the definition of the derivative on $\mathbb{R}^n$ we can't really proceed, since our object of consideration is now a matrix (containing variables) whose change we want to best approximate by a linear function. What am I missing here?

Here's one viewpoint. Let $f:\mathbb R^n \to \mathbb R$ be differentiable at $x_0$. Then $f'(x_0)$ is a $1 \times n$ matrix, and we can define the gradient of $f$ at $x_0$ to be the column vector $\nabla f(x_0) = f'(x_0)^T$. The Hessian of $f$ at $x_0$ is the matrix $Hf(x_0) = g'(x_0)$, where $g$ is the function defined by $g(x) = \nabla f(x)$. — littleO, Mar 24 '22 at 23:52

peek-a-boo · Accepted Answer · 2022-03-24T23:47:13.783

$Df:U\to L(\Bbb{R}^n,\Bbb{R})$ is a function. The domain is a vector space, the target space is a vector space, and both are finite-dimensional, so they can easily be considered normed vector spaces. So, you can define derivatives; see my answer to Differentiation definition for spaces other than $\Bbb{R}^n$ for more details. $f$ is twice differentiable if $Df$ is differentiable. If this is the case, you have $D^2f:\Bbb{R}^n\to L\bigg(\Bbb{R}^n,L(\Bbb{R}^n,\Bbb{R})\bigg)\cong L^2(\Bbb{R}^n;\Bbb{R})$, where the latter is the space of all bilinear maps $\Bbb{R}^n\times\Bbb{R}^n\to\Bbb{R}$. So, at any point $p$, you can consider $D^2f_p$ as a bilinear map $\Bbb{R}^n\times\Bbb{R}^n\to\Bbb{R}$ (or equivalently, as a linear map $\Bbb{R}^n\otimes\Bbb{R}^n\to\Bbb{R}$). Once you choose a basis for the vector space $\Bbb{R}^n$, every bilinear form can be assigned a matrix in the obvious way: look at $H_{ij}=D^2f_p(e_i,e_j)$, where $\{e_1,\dots, e_n\}$ is a basis of $\Bbb{R}^n$.

Here is some basic review of linear algebra: suppose $U,V,W$ are vector spaces over the same field $\Bbb{F}$. Then, we have the following isomorphisms: \begin{align} L(U, L(V,W))\cong L^2(U\times V; W)\cong L(U\otimes V, W), \end{align} i.e the space of linear maps $U\to L(V,W)$ is isomorphic to the space of bilinear maps $U\times V\to W$, which is isomorphic to the space of linear maps $U\otimes V\to W$ (tensor product over the field $\Bbb{F}$). If you choose bases for $U,V,W$ you can then express elements of these spaces as $(\dim U\cdot \dim V)\times \dim W$ matrices.

In the above discussion, we have $U=V=\Bbb{R}^n$ and $W=\Bbb{R}$ and $\Bbb{F}=\Bbb{R}$.

thanks, that's exactly my point: In my course they did not introduce the "Frechet derivative" but only the one restricted to the norm in $\mathbb{R}^n$, but still they talk about second derivatives of a function. This confused me a bit. — Richard, Mar 25 '22 at 10:39

How can the hessian be the second derivative when the first derivative is a matrix depending on variables?

1 Answers1