Differentiating a column with respect to a matrix

Question

Let $\mathbf{X} = [\mathbf{x}_1 | ... | \mathbf{x}_n]$ be a $m \times n$ matrix. I would like to differentiate $\mathbf{x}_i = \mathbf{X} \mathbf{e}_i$ (where $\mathbf{e}_i \in \mathbb{R}^{n \times 1}$ is the unit vectors with $1$ on the $i$th place and $0$'s in the rest) with respect to $\mathbf{X}$. Then $$ d\mathbf{x}_i = d(\mathbf{X}\mathbf{e}_i) = (\mathbf{X} + d\mathbf{X})\mathbf{e}_i - \mathbf{X}\mathbf{e}_i = (d\mathbf{X})\mathbf{e}_i $$ and therefore $$ \frac{d\mathbf{x}_i}{d\mathbf{X}} = \mathbf{e}_i \in \mathbb{R}^{n \times 1} $$ However, I suspect that is not consistent dimension-wise. For example: $f(\mathbf{X}) = \mathbf{a} \mathbf{x}_i$ where $\mathbf{a} \in \mathbb{R}^{1 \times m}$ then simply using the result above $$ \frac{d f(\mathbf{X})}{d\mathbf{X}} = \frac{d(\mathbf{a}\mathbf{x}_i)}{d\mathbf{X}} = \mathbf{a} \mathbf{e}_i \implies \mbox{Dimensions mismatch!} $$ since $\mathbf{a} \in \mathbb{R}^{1 \times m}$ and $\mathbf{e}_i \in \mathbb{R}^{n \times 1}$.

How to fix this issue? An idea is to put a pseudo identity matrix $$ \frac{d\mathbf{x}_i}{d\mathbf{X}} = \mathbf{I}_{m \times n} \mathbf{e}_i \in \mathbb{R}^{n \times 1} $$ such that $\mathbf{X} = \mathbf{X} \circ \mathbf{I}_{m \times n}$ with Hadamard product. But is this the right way to go?

The input is an $m \times n$ matrix. The output is an $m$-vector. There are $m$ derivatives — the derivative of each entry of the output with respect to the $m \times n$ matrix input. An $m \times n \times m$ "matrix" is needed. It would be easier if you asked for each of these $m$ derivatives. — Rodrigo de Azevedo, Mar 25 '21 at 10:40
If you consider the function that extracts a single entry (rather than a single column), then take a look at this. — Rodrigo de Azevedo, Mar 25 '21 at 10:43

greg · Accepted Answer · 2021-04-23T02:51:57.743

1

$\def\p#1#2{\frac{\partial #1}{\partial #2}}\def\E{{\cal E}}$Use $(\star)$ to denote the dyadic product and a colon to denote the double-dot product, i.e. $$\eqalign{ \Gamma &= A\star B \quad&\implies\quad \Gamma_{ijk\ell} = A_{ij}B_{k\ell} \\ Y &= \Gamma:X \quad&\implies\quad Y_{ij}= \sum_{k,\ell}\;\Gamma_{ijk\ell}X_{k\ell} \\ }$$ First, rewrite the linear equation $b=Xa\,$ using index notation $$\eqalign{ b_i &= X_{ik}\,a_k \\ &= \delta_{ij} X_{jk}\,a_k \\ &= \delta_{ij} a_k\,X_{jk} \\ }$$ where $\delta_{ik}$ is a Kronecker delta; these are simply the components of the identity matrix $I$.

Rewrite the linear equation using the dyadic and double-dot products, and then calculate its differential and gradient. $$\eqalign{ b &= (I\star a):X \\ db &= (I\star a):dX \\ \p{b}{X} &= (I\star a) \\ }$$ Finally, substitute $(a=e_i,\;b=Xa=x_i)\;$ to obtain $$\eqalign{ \p{x_i}{X} &= I\star e_i \\ }$$

edited Apr 23 '21 at 02:51

answered Mar 25 '21 at 11:39

greg

40,033

So this tensor dimension are $m \times n \times m \times n$? How to explicitly compute the matrix products involving it? e.g. $A E B^T$? – Triceratops Mar 31 '21 at 13:21
1

$\def\p#1#2{\frac{\partial #1}{\partial #2}}$Would it help to show the result in component form? $$ \left(\p{x_i}{X}\right){jk\ell} = \left({\cal E}e_i^T\right){jk\ell} = {\cal E}{jk\ell i} = \delta{j\ell} \delta_{ki} $$ – greg Mar 31 '21 at 16:14
Yes. But can you define the products $X \mathcal{E} Y$ and $\mathcal{E} : X$ in components form? – Triceratops Apr 01 '21 at 07:03
Also, des $\mathcal{E} X = \sum_l \mathcal{E}{i,j,k,l} X{l,n} = \sum_l \delta_{i,k}\delta_{j,l} X_{l,n} = \delta_{i,k} X_{j,n}$? – Triceratops Apr 01 '21 at 08:49
1

Yes that's right, juxtaposition implies a single-dot product in matrix notation. And the colon notation is an explicit double-dot product $$\eqalign{ {\cal E}X &= &\sum_\ell {\cal E}{ijk\ell}X{\ell p} \{\cal E}:X &= \sum_k &\sum_\ell {\cal E}{ijk\ell}X{k\ell} \ }$$ – greg Apr 01 '21 at 15:38
You might want to response also here: https://math.stackexchange.com/questions/4085534/operations-between-rank-4-tensor-and-matrices-definitions-and-computations – Triceratops Apr 01 '21 at 16:11
I updated the answer to eliminate the need for the $\cal E$ tensor. Hopefully, it's easier to understand. – greg Apr 21 '21 at 20:03

Differentiating a column with respect to a matrix

1 Answers1

Linked