0

Let $\| A \|_2 := \sqrt{\lambda_{\max}(A^TA)}$. As part of the gradient of a regularized loss function (for machine learning), I need the gradient $\nabla_A \| A \|_2^2$, which, using the chain rule, expands to $2 \| A \|_2 \nabla_A \| A \|_2$. How can I find $\nabla_A \| A \|_2$?

I start with $\|A\|_2 = \sqrt{\lambda_{\max}(A^TA)}$, then get

$$\nabla_A \| A \|_2 = \frac{1}{2 \sqrt{\lambda_{\max}(A^T A)}} \nabla_A \left(\lambda_{\max} \left( A^T A \right) \right)$$

but after that I have no idea how to find $\nabla_A \left(\lambda_{\max} \left( A^T A \right) \right)$.

edit: would I just take the derivative of $A$ (call it $A'$), and take $\lambda_{\max}(A'^TA')$?

java
  • 43
  • 4

1 Answers1

1

Consider the SVD of $\mathbf{A}=\mathbf{U}\mathbf{\Sigma}\mathbf{V}^T$. It follows that $\mathbf{A}^T\mathbf{A}=\mathbf{V}\mathbf{\Sigma}^2\mathbf{V}$.

The matrix norm is thus related to the maximum singular value of $\mathbf{A}$. Let $s_1$ be such value with the corresponding left and right singular vectors $\mathbf{u}_1$ and $\mathbf{v}_1$.

$$ \| \mathbf{A} \|_2 = \sigma_1(\mathbf{A}) = \sqrt{\lambda_1 \left( \mathbf{A}^T\mathbf{A} \right)} $$

We know that $$d\sigma_1 = \mathbf{u}_1 \mathbf{v}_1^T : d\mathbf{A}$$

It follows that $$ \frac{\partial}{\partial \mathbf{A}} \| \mathbf{A} \|_2^2 = 2 \sigma_1 \mathbf{u}_1 \mathbf{v}_1^T $$

Steph
  • 4,140
  • 1
  • 5
  • 13