6

My question comes from this article in Wikipedia. I noticed that there is a chain rule defined for the composition of $f:\mathbb{R}\to\mathbb{R}$ and $ g: \mathbb{R}^n \to \mathbb{R}$ given by $$ \nabla (f \circ g) = (f' \circ g) \nabla g \tag{1} $$ My question is if instead we had some functions $f: \mathbb{R}^m \to \mathbb{R}$ and $g: \mathbb{R}^n \to \mathbb{R}^m$ such that $(f \circ g): \mathbb{R}^n \to \mathbb{R}$, does there exist an expression for $\nabla (f \circ g)$ similar to equation $(1)$?

I tried looking for any resource who answered this but had no luck. If someone could point me in the right direction I would greatly appreciate it. Thank you!

RobPratt
  • 50,938
Robert Lee
  • 7,654

1 Answers1

12

Background info: If $g:\mathbb R^n \to \mathbb R^m$ is differentiable at $x$, then $g'(x)$ is an $m \times n$ matrix. If $f:\mathbb R^m \to \mathbb R$ is differentiable at $u$, then $f'(u)$ is a $1 \times m$ matrix (row vector). If we use the convention that the gradient of $f$ at $u$ is a column vector, then $\nabla f(u) = f'(u)^T$.


The multivariable chain rule is actually easy. Let $h(x) = f(g(x))$. The chain rule tells us that $$ h'(x) = f'(g(x)) g'(x). $$ This formula is wonderful because it looks exactly like the formula from single variable calculus. This is a great example of the power of matrix notation. If we use the convention that the gradient is a column vector, then $$ \nabla h(x) = h'(x)^T = g'(x)^T \nabla f(g(x)). $$


By the way, if $f:\mathbb R \to \mathbb R$ and $g:\mathbb R^n \to \mathbb R$, then the chain rule tells us that the derivative of $h(x) = f(g(x))$ is $h'(x) = f'(g(x)) g'(x)$. If we use the convention that the gradient is a column vector, then $$ \nabla h(x) = h'(x)^T = \underbrace{g'(x)^T}_{\text{column vector}} \underbrace{f'(g(x))}_{\text{scalar}} = f'(g(x)) \nabla g(x). $$ So the version of the chain rule you mentioned in your post is just a special case of the standard chain rule.

littleO
  • 54,048