Differentiating the matrix square root

Question

I start with a relation $$\begin{eqnarray} A =(I+C)^{1/2} \end{eqnarray}$$ Where $A$, $I$ and $C$ are all matrices (say $2\times 2$) and $I$ is the identity matrix.

One can consider $A$ as a mapping $$A : U \to S_2,\ \ \ C\mapsto (I+C)^{1/2}$$

where $S_n$ is the space of symmetric matrices (which is just $\mathbb R^N$ for some $N$) and $U$ is the open set in $S_2$ so that $I+C$ is positive definite.

I wish to take the derivative of $A$ w.r.t $C$. How can we express this relation in an index form?

This is what I wish to do with this relation

Hi, I've made some edits and hope to make this clearer. Please check if it is okay. — , Oct 15 '17 at 03:17
@JohnMa Thanks. $A$ and $I + C$ is positive definite. I thought it has nothing to do with its derivative. But thanks for the clarification. — Utpal Kumar, Oct 15 '17 at 04:13
Is this all encased in some function $f : \mathbb S\to \mathbb R$? Otherwise it's like a 4 dimensional object, dA_{ij}/dC_{kl} which is doable but kind of confusing. — Y. S., Oct 15 '17 at 04:27
@JohnMa This gives me some insights but doesn't completely resolve my problem. — Utpal Kumar, Oct 15 '17 at 04:31
@whyyes I need to find the solution because my other results are quite dependent on it. I seek what is the RHS of $\frac{dA_{ij}}{dC_{kl}}$? — Utpal Kumar, Oct 15 '17 at 04:35
Do you have the eigenvalue decomposition of C? I'm not sure how it'll help, but it makes the $C^{1/2}$ part easier... — Y. S., Oct 15 '17 at 04:38
@whyyes I am not solving it numerically but more of theoretically now. So I don't have the eigenvalue decomposition of C. I have attached an image to explain what I wish to do further. — Utpal Kumar, Oct 15 '17 at 04:54
Right but you can assume that eigenvalues / eigenvectors exist, right? — Y. S., Oct 15 '17 at 05:36
@whyyes Yes, they are supposed to exist if this equation is to be solved numerically. — Utpal Kumar, Oct 15 '17 at 05:37

score 1 · Accepted Answer · answered Oct 15 '17 at 06:00

Here's my first try. It's super messy and there might be a mistake somewhere but the overall concept makes sense to me. It depends on knowing the eigenvalue decomposition of $C = USU^T$, which if you're proving something theoretically you can assume you know.

Take $A = (I+C)^{1/2}$ and try to find the gradient of $A_{ij} = e_i^T(I+C)^{1/2}e_j$.

Take $U S U^T = C$ the eigenvalue decomposition. Then $A = (I+C)^{1/2} = U(S+I)U^T$ and $u_i$ the $i$th column of $U$, $\bar u_i$ the $i$th row of $U$.

$$A_{ij} = e_iU(S+I)^{1/2}U^T e_j = \bar u_i(U^TCU+I)^{1/2}\bar u_j.$$

Define $D = U^TCU + I = S + I$, which is diagonal. Then

$f_{ij}(D) = A_{ij} = \bar u_i^TD^{1/2}\bar u_j = \sum_k U_{ik} U_{jk} D_{kk}^{1/2}$, then $\nabla f_{ij}(D) = \textrm{diag}(\bar u_i\circ \bar u_j \circ \textrm{diag}(D^{-1/2}))$

where $\circ$ is elementwise multiplication.

Additionally, defining $g_i(C) = D_{ii} = u_i^TCu_i$ and $\nabla g_i(C) = u_iu_i^T$.

Using chain rule, you should get something like

$$\frac{\partial A_{ij}}{\partial C_{kl}} = \left(\sum_m (\nabla f_{ij}(D))_{mm} \nabla g_m(C)\right)_{kl} $$

using predefined quantities.

score 1 · Answer 2 · answered Oct 17 '17 at 06:03

Let $Z$ be the set of $n\times n$ symmetric matrices with eigenvalues in $(-1,+\infty)$, $S$ be the set of $n\times n$ symmetric matrices, $S^+$ be the set of $n\times n$ symmetric $>0$ matrices and $f:C\in Z\rightarrow (I+C)^{1/2}\in S^+$.

Then one form of the derivative of $f$ is: $Df_C:H\in S\rightarrow \int_0^{+\infty}e^{-t\sqrt{I+C}}He^{-t\sqrt{I+C}}dt$

cf. Derivative (or differential) of symmetric square root of a matrix

In particular $\dfrac{\partial{f}}{\partial{C_{i,i}}}=\int_0^{+\infty}e^{-t\sqrt{I+C}}E_{i,i}e^{-t\sqrt{I+C}}dt$ but, beware, $\dfrac{\partial{f}}{\partial{C_{i,j}}}$ where $i\not= j$ is non-sense; yet, you may calculate the derivative with respect to the block $B=C_{i,j},C_{j,i}$:

$\dfrac{\partial{f}}{\partial{B}}= \int_0^{+\infty}e^{-t\sqrt{I+C}}(E_{i,j}+E_{j,i})e^{-t\sqrt{I+C}}dt$.

score 1 · Answer 3 · answered Oct 19 '17 at 14:07

Let ${\mathcal E}$ denote the isotropic 4th order tensor with components $${\mathcal E}_{ijkl} = \delta_{ik}\delta_{jl}$$ Then finding the differential and gradient of $C$ wrt $A$ is straightforward $$\eqalign{ C &= A^2 - I \cr dC &= dA\,A+A\,dA = ({\mathcal E}A^T+A{\mathcal E}):dA \cr \frac{\partial C}{\partial A} &= {\mathcal E}A^T+A{\mathcal E} \cr }$$ But you wanted the inverse of this gradient, so let's vectorize the differential expression $$\eqalign{ dc &= (A^T\otimes I + I\otimes A)\,da \cr da &= (A^T\otimes I + I\otimes A)^{-1}\,dc \cr \frac{\partial a}{\partial c} &= (A^T\otimes I + I\otimes A)^{-1} \cr }$$

Differentiating the matrix square root

3 Answers3