I was reading on Wikipedia about total derivatives of functions and they stated the following about the chain rule for total derivatives:
Let $f:\mathbb R^m\to \mathbb R^k$ and $g:\mathbb R^n \to \mathbb R^m$ be two differentiable functions and let $a \in \mathbb R^n$. Let $D_{g(a)}f$ denote the total derivative of $f$ at $g(a)$ and $D_a g$ denote the total derivative of $g$ at a. Then: $$D_a(f\circ g)=D_{g(a)}f\circ D_a g$$ or, for short: $$D(f\circ g)=Df\circ Dg$$
The thing I'm not understanding is the following: What does $Df\circ Dg$ mean?
Those two total derivatives are defined as functions: $Df: \mathbb R^m\to \cal L(\mathbb R^m,\mathbb R^k)$, and $Dg: \mathbb R^n\to \cal L(\mathbb R^n,\mathbb R^m)$
So how is the composition $D(f\circ g)=Df\circ Dg$ defined? Am I missing something or is this a typo?