0

Matrix Calculus utilizes function composition of, at least phenomenologically, several different types. While I've encountered each before in different subjects, synthesizing them all into the same subject produces some friction, and I am not sure how much asymmetry in notation is only surface-deep and how much reflects truly different mathematical concepts of function composition. Some (apparent) truths I've observed are...

The composition (matrix multiplication) of $1$ by $1$ matrices does not appear to be the same as the composition of their scalar elements, despite $1$ by $1$ matrices ordinarily being viewed as isomorphic to scalars, i.e., it is not correct to write $$d(x^3) = 3x^2 dx = 3x^2 \circ dx = 3(dx)^2$$ where $x$ is a scalar.

"Concrete" linear operators such as the trace do appear to compose in the way scalars don't, i.e., it is correct to write (up to the fact that I can't actually find the correct notion to write it)

$$d(\operatorname{tr} (X)) = \operatorname{tr}(\cdot) dX = \operatorname{tr} \circ\ dX = \operatorname{tr}(dX) $$

where $X$ is a matrix of variables and $\cdot$ means "eat the thing to the right" (of course, I have used linearity of the trace to reverse engineer that the Jacobian is $\operatorname{tr}(\cdot)$, but the point is just that $dX$ does get eaten unlike the scalar case).

There appears to be ambiguity in phrases such as "the differential of the composition of $G(X)$ and $H(X)$" and perhaps even $d(G(X) \circ H(X))$, where $X$ is a matrix and $G$, and $H$ are "abstract" linear operators. Such a phrase could mean $d(G(X)H(X))$ which calls for the product rule, or $d(G(H(X)))$ which calls for the chain rule. Both uses appear in the chain rule itself.

Is there a holistic concept of function composition which unifies some or all of these composition notations, or are they fundamentally different mathematical objects? Is there any way to change between them or simplify expressions to use a single type of composition, such as via a generalization of the tree diagram from Multivariable Calculus which incorporates the other types)?

user10478
  • 2,118
  • Who is writing $G(X) \circ H(X)$? I don't see anyone who isn't writing it correctly as $(G \circ H)(X)$. No ambiguity, and well, it actually makes sense! The timestamped lecture you link is perhaps a bit sloppy, but I don't see them doing writing $g(x) \circ h(x)$. – Mikael Öhman Apr 28 '25 at 23:44
  • Have you tried focusing only on functions of the type ${\Bbb R}^{m \times n} \to {\Bbb R}$? – Rodrigo de Azevedo Apr 28 '25 at 23:45
  • @MikaelÖhman I'm not sure where I initially saw the notation or if I made it up. I can see that the notation you suggest is more common and could avoid an orthogonal ambiguity over order of operations. – user10478 Apr 29 '25 at 16:51
  • @RodrigodeAzevedo I don't have a sense for why that's important here. I mean, some parts of the course so far especially regarding Gradients has restricted itself that way, but other parts related to the matrix squaring function and so forth have not. – user10478 Apr 29 '25 at 16:55

0 Answers0