8

Everywhere is definition of total differential I see the sum of partial derivatives multiplied by appropriate differentials, but there is nowhere clear explanation why it is.

  • Start by proving the multi-variate chain rule. – Simply Beautiful Art Feb 14 '17 at 01:21
  • 6
    If a variable is dependent on a number of other variables, then the total change in the former (given by the differential element of the dependent variable) must be computed by adding up the changes in all of the latter (given by the individual differential elements of the independent variables) weighted (by multiplication) by the respective influence of each variable (given by the partial derivatives). That's the intuitive explanation (of course not a formal rigorous one). – Deepak Feb 14 '17 at 01:21
  • 1
    @Deepak worthy of being an answer – Brevan Ellefsen Feb 14 '17 at 01:31
  • @Deepak "must be computed by adding up the changes in all of the latter" why it must? – Dmytro Nalyvaiko Feb 14 '17 at 01:33
  • I don't clearly understand, when imagine 3d chart, why sum of changes in x and y axes gives the real change of z axes. – Dmytro Nalyvaiko Feb 14 '17 at 01:38
  • This question seems like it has a lot of relevant information. – amd Feb 14 '17 at 01:47

1 Answers1

3

The differential of a function $f:\mathbb R^m\to\mathbb R^n$ is a linear map that is the “best” approximation to the change of $f$ near some point $\mathbf p=(p^1,\dots,p^n)$, i.e., $f(\mathbf p+\mathbf h)=f(\mathbf p)+\operatorname{d}f_{\mathbf p}[\mathbf h]+o(\|\mathbf h\|)$. Restricting ourselves to a scalar-valued function $f:\mathbb R^n\to\mathbb R$, it’s fairly straightforward to show that ${\partial f\over\partial x_k}(\mathbf p)=\operatorname{d}f_{\mathbf p}[\mathbf e^k]$, where $\mathbf e^k$ is the basis vector corresponding to the $x^k$ coordinate. Since a linear map is determined by its action on the basis vectors, in this coordinate system we can write $\operatorname{d}f_{\mathbf v}$ as the row vector $\left({\partial f\over\partial x_1}(\mathbf p),\dots,{\partial f\over\partial x_n}(\mathbf p)\right)$ so that $\operatorname{d}f_{\mathbf p}[\mathbf h]$ becomes simple matrix multiplication (or, if you prefer, a dot product).

Now, the differential $dx^i$ of the affine coordinate function $x^i$ is just a function that assigns to a point $\mathbf p$ its $i$th coordinate. Using the above matrix formulation, this means that $dx^1=(1,0,\dots,0)$, $dx^2=(0,1,0,\dots,0)$, and so on. So we can write $\operatorname{d}f_{\mathbf p}$ as $${\partial f\over\partial x_1}(\mathbf p)(1,0,\dots,0)+\cdots+{\partial f\over\partial x_n}(\mathbf p)(0,0,\dots,1)$$ or $${\partial f\over\partial x_1}dx^1+\cdots+{\partial f\over\partial x_n}dx^n$$ (with the partial derivatives evaluated at $\mathbf p$).

It might help to look at this geometrically. For a scalar-valued function $f$, this linear approximation amounts to approximating the $n$-dimensional hypersurface (in $\mathbb R^{n+1}$) $y=f(\mathbf x)$ at the point $\mathbf p$ by its tangent hypersurface at that point. Just as the derivative of $f$ gives the slope of the tangent line to the curve $y=f(x)$ in the one-dimensional case $f:\mathbb R\to\mathbb R$, in the multidimensional case each partial derivative ${\partial f\over\partial x_i}$ gives the slope of the tangent hypersurface in the $x^i$ direction. The equation of the tangent hypersurface at $\mathbf p$ is thus $$y={\partial f\over\partial x_1}(x^1-p^1)+\cdots+{\partial f\over\partial x_n}(x^n-p^n)=\left({\partial f\over\partial x_1},\cdots,{\partial f\over\partial x_n}\right)(\mathbf x-\mathbf p),$$ with the partial derivatives evaluated at $\mathbf p$. Comparing this to the definition of $\operatorname{d}f_{\mathbf p}$ at the top, we again find that it can be represented as a row vector of partial derivatives, and proceed as before.

amd
  • 55,082
  • It is too difficult. I don't understand your answer since f(p+h)=f(p)+dfp[h]+o(‖h‖) – Dmytro Nalyvaiko Feb 14 '17 at 09:10
  • @DmitryNalyvaiko Do you at least understand the connection between the tangent (hyper)plane and partial derivatives described in the last paragraph? If you get that and how the tangent hyperplane relates to the differential, the rest is just a matter of definitions and algebra. – amd Feb 14 '17 at 19:22
  • 1

    Do you at least understand the connection between the tangent (hyper)plane and partial derivatives described in the last paragraph?

    No. Can you explain simpler the answer of my question?

    – Dmytro Nalyvaiko Feb 14 '17 at 20:17
  • @DmitryNalyvaiko Have a look at this question and this one that I referenced earlier. Beyond that, I don’t think I’m going to be able to help you. It’s starting to look like it’ll take more space and time than I have to do so. – amd Feb 15 '17 at 03:27