4

I have a matrix equation that yields a scalar

$$f(M) = MAM^T - 2 \sum_i^{N} \log(M_i)$$

Where $M$ is a $1 \times N$ row vector, and $A$ is an $N \times N$ matrix. As such, the result $f$ is a scalar.

How does one take the derivative of $f$ w.r.t $M$? I have seen matrix cookbook define derivatives of matrices w.r.t specific index values, but I couldn't find a definition of differentiation w.r.t vectors.

My intuition is something like

$$\frac{\partial f}{\partial M} = 2(AM - M^{\circ -1})$$

with my reasoning being the two $M$'s in the first term yield $2AM$ once differentiated, and the log term yields $M$ where each element is raised to the $-1$ power ($\frac{d \log x}{dx} = x^{-1}$).

Again, I'm not sure if I've done this correctly, and the fact that $f$ is a scalar makes it a bit more confusing.

2 Answers2

4

We use the linearity of differentiation and consider at first \begin{align*} g(M)=MAM^T\tag{1} \end{align*} with $M=(M_i)_{1\leq i\leq N}$ an $(1\times N)$-matrix.

We obtain \begin{align*} dg(M)&=dMAM^T+MAdM^T\tag{2}\\ \mathrm{vec}(dg(M))&=\mathrm{vec}(dMAM^T)+\mathrm{vec}(MAdM^T)\tag{3}\\ &=\left(MA^T\otimes I_1\right)\mathrm{vec}(dM)+\left(I_1\otimes MA\right)\mathrm{vec}\left(dM^T\right)\tag{4}\\ &=MA^T\mathrm{vec}(dM)+MA I_n\mathrm{vec}(dM)\tag{5}\\ &=\left(MA^T+MA\right)\mathrm{vec}(dM)\\ \color{blue}{\frac{\partial g(M)}{dM}}&=\frac{\partial \mathrm{vec}(dg(M))}{\mathrm{vec}(dM)}=\color{blue}{M\left(A^T+A\right)}\tag{6} \end{align*}

Comment:

  • In (2) we start by calculating the differential.

  • In (3) we vectorize the equation.

  • In (4) we use the relationship with Kronecker products to factor out $\mathrm{vec}(dM)$ resp. $\mathrm{vec}(dM^T)$.

  • In (5) we do a simplification and use $\mathrm{vec}(dM^T)=C\mathrm{vec}(dM)$ by noting the commutation matrix $C=I_n$.

  • In (6) we take the gradient.

We can check the result (6) by setting

\begin{align*} g(M)&=MAM^T\\ &=\left(M_i\right)_{1\leq i\leq N}\left(A_{ij}\right)_{1\leq i,j\leq N}\left(M_i\right)^T_{1\leq i\leq N}\\ &=\left(\sum_{j=1}^N M_jA_{ij}\right)_{1\leq i\leq N}\left(M_i\right)^T_{1\leq i\leq N}\\ &=\sum_{i=1}^N\sum_{j=1}^N M_iM_jA_{ij} \end{align*}

We obtain

\begin{align*} \color{blue}{\frac{\partial g(M)}{\partial M}}&=\frac{\partial}{\partial\left(M_1,\ldots,M_N\right)}\left(\sum_{i=1}^N\sum_{j=1}^NM_iM_jA_{ij}\right)\\ &=\left(\frac{\partial}{\partial M_k}\sum_{i=1}^N\sum_{j=1}^N M_iM_jA_{ij}\right)_{1\leq k\leq N}\\ &=\left(\sum_{{j=1}\atop{j\ne k}}^N M_jA_{kj}+\sum_{{i=1}\atop{i\ne k}}^NM_iA_{ik}+2M_kA_{kk}\right)_{1\leq k\leq N}\\ &\,\,\color{blue}{=\left(\sum_{j=1}^NM_j\left(A_{kj}+A_{jk}\right)\right)_{1\leq k\leq N}} \end{align*}

in accordance with (6).

Finally considering $f$ we obtain using (6) \begin{align*} \frac{\partial f(M)}{\partial M}&=M\left(A^T+A\right)-2\frac{\partial}{\partial (M_1,\ldots,M_N)}\sum_{i=1}^N\log(M_i)\\ &=M\left(A^T+A\right)-2\left(\frac{\partial}{\partial M_k}\sum_{i=1}^N\log(M_i)\right)_{1\leq k\leq N}\\ &\,\,\color{blue}{=M\left(A^T+A\right)-2\left(\frac{1}{M_k}\right)_{1\leq k\leq N}} \end{align*}

Markus Scheuer
  • 112,413
2

You can think of $f$ as a function $f:\mathbb R^n\to\mathbb R$. Then if $u\in\mathbb R^n$,$$\frac{\partial f}{\partial u}=\nabla f\cdot u=\begin{pmatrix}\frac{\partial f}{\partial x_1}&\cdots&\frac{\partial f}{\partial x_n}\end{pmatrix}\begin{pmatrix}u_1\\\vdots\\u_n\end{pmatrix}=\sum_{k}\frac{\partial f}{\partial x_k}u_k.$$In particular, define $f:\mathbb R^{1\times n}\to\mathbb R$ by$$M\mapsto MAM^T-2\sum_k\log(M_k),$$where $A\in\mathbb R^{n\times n}$. Then, if $N\in\mathbb R^{1\times n}$,$$\frac{\partial f}{\partial N}(M)=(M(A+A^T)-2M^{\circ-1})\cdot N.$$

Note that if $A$ is symmetric,$$\frac{\partial f}{\partial N}(M)=2(MA-M^{\circ-1})\cdot N.$$

wjmolina
  • 6,360
  • 7
  • 47
  • 97
  • What is $N$? It doesn't appear in the question and it isn't defined in your answer. – greg Apr 04 '19 at 19:30
  • $N$ is a direction in which to compute the derivative of $f$, so that $\frac{\partial f}{\partial N}(M)$ is the derivative of $f$ evaluated at $M$ in the direction of $N$. – wjmolina Apr 04 '19 at 19:32