5

Let $X\in \mathbb{R}^{m\times n}$ be a matrix with linearly independent columns and let $\mathcal{N}$ be a neighborhood of $X$ such that every $Y\in \mathcal{N}$ has linearly independent columns as well.

Now let $F\colon \mathcal{N}\to \mathbb{R}^{m\times n}$ be a function that maps $Y\in \mathcal{N}$ to the output of the Gram-Schmidt orthonormalization procedure applied to the columns of $Y$ (similarly arranged in a matrix).

I see that $F$ is a smooth function of $Y$ in $\mathcal{N}$, and this is also mentioned in some books, but computing any sort of "formula" for the (partial) derivative(s) of $F$ at $X$, with respect to each variable-entry of $Y$, seems impractical. That said:

Is there a simple (perhaps iterative) formula for the (partial) derivative(s) of $F$ at $X$? If yes, how can I compute it?

Two-column case ($n=2$):

Suppose that $X=[u,v]$ where $u,v\in \mathbb{R}^m$ and define $$G\colon [u,v]\mapsto \begin{bmatrix} u, & v-\frac{\langle v,u\rangle}{\|u\|^2}u \end{bmatrix}\doteq [G_1(u,v), \ G_2(u,v)],$$ which is the output of a Gram-Schmidt orthogonalization (not -normalization) process. Then, if my computations are correct:

  • $D_u G_1(u,v) = I_m$, where $I_m$ is the $m\times m$ identity matrix;
  • $D_v G_1(u,v) = 0$;
  • $D_u G_2(u,v) = \frac{1}{\|u\|^2}\left(\frac{2\langle u,v \rangle}{\|u\|^2}uu^T - \langle u,v \rangle I_m - uv^T\right)$;
  • $D_v G_2(u,v) = I_m - \frac{uu^T}{\|u\|^2}$.

Then since

$$F\colon [u,v]\mapsto\begin{bmatrix} \frac{G_1(u,v)}{\|G_1(u,v)\|}, & \frac{G_2(u,v)}{\|G_2(u,v)\|} \end{bmatrix}\doteq [F_1(u,v), \ F_2(u,v)]$$

it holds that:

  • $D_{u,v} F_{1,2}(u,v)=\frac{D_{u,v}G_{1,2}(u,v)}{\|G_{1,2}(u,v)\|} - \frac{D_{u,v}G_{1,2}(u,v)^T G_{1,2}(u,v)G_{1,2}(u,v)^T}{\|G_{1,2}(u,v)\|^3}.$

Even if this is all correct, I wonder whether there is a shorter way of rewriting it (for larger $n$ as well)...

Koto
  • 949
  • 1
    Have you attempted to do it "by hand" for the case $n=2$ ? – Jean Marie Nov 17 '21 at 15:00
  • 1
    @JeanMarie Well, I did try after your comment... I just edited the question to include my attempt. – Koto Nov 17 '21 at 18:07
  • 1
    There is an interesting (expected) feature : $D_v G_2(u,v) = I_m - \frac{uu^T}{|u|^2}$ is the classical operator of orthogonal projection (in general) onto the hyperplane orthogonal to $u$ (here onto the 1D vector line directed by a vectore orthogonal to $u$. You will find it in any dimension... – Jean Marie Nov 17 '21 at 18:45
  • 1
    @JeanMarie THANK YOU! It was there all the time, but I didn't notice it before! This, together with the computation itself (the "hard" one), seems to be exactly what I wanted. I'm gonna work on it a bit more, but I can already visualize that this is what I was looking for. – Koto Nov 17 '21 at 18:51
  • Happy to have contributed to the solution ! Have a good night or a good day according to your longitude ! – Jean Marie Nov 17 '21 at 19:15

1 Answers1

5

Yes, I have derived such an expression recently. If the normalized ($D$ dimensional) vectors are given by:

$ {\bf w}_i = {\bf q}_i - \sum_{j=1}^{i-1}\left(\frac{{\bf w}_j^T{\bf q}_i}{{\bf w}_j^T{\bf w}_j}\right){\bf w}_j, \quad i = 1,\cdots, d$,

then the gradients $\partial{\bf w}_i/\partial{\bf q}_k$ are given by a recurrence relation:

\begin{align} \boxed{ \frac{\partial{\bf w}_1}{\partial{\bf q}_1} =: D_{11} = I_D \\ \frac{\partial{\bf w}_i}{\partial{\bf q}_i} =: D_{ii} = D_{i-1,\, i-1} - \frac{{\bf w}_{i-1}{\bf w}_{i-1}^T}{{\bf w}_{i-1}^T{\bf w}_{i-1}},\quad i > 1, \\ % \frac{\partial{\bf w}_i}{\partial{\bf q}_k} = \sum_{j=1}^{i-1}D_{ij}\frac{\partial{\bf w}_j}{\partial{\bf q}_k},\quad i\neq k, \quad i > k, \\ % D_{ij}:= % -\frac{\partial}{\partial{\bf w}_j}\left[\left(\frac{{\bf w}_j^T{\bf q}_i}{{\bf w}_j^T{\bf w}_j}\right){\bf w}_j\right] = % -\left[\frac{1}{{\bf w}_j^T{\bf w}_j}\;{\bf w}_j{\bf q}_i^T - \frac{2{\bf w}_j^T{\bf q}_i}{({\bf w}_j^T{\bf w}_j)^2}\;{\bf w}_j{\bf w}_j^T + \frac{{\bf w}_j^T{\bf q}_i}{{\bf w}_j^T{\bf w}_j}\; I_D\right],\quad i\neq j,\quad i > j.} % \end{align} Here, $I_D$ is the $D$-dimensional identity matrix.

To find the derivatives of normalized vectors ${\bf w}_i/\lVert {\bf w}_i\rVert_2$ we can just premultiply $\partial{\bf w}_i/\partial{\bf q}_k$ with a matrix which only depends upon ${\bf w}_i$:

\begin{align} \boxed{\frac{\partial}{\partial{\bf q}_k}\left(\frac{{\bf w}_i}{\lVert{\bf w}_i\rVert_2}\right) = \left[\frac{I_D}{\lVert{\bf w}_i\rVert_2} - \frac{{\bf w}_i{\bf w}_i^T}{\lVert{\bf w}_i\rVert^3_2}\right]\frac{\partial{\bf w}_i}{\partial{\bf q}_k}.} \end{align}

More info can be found here: https://wedeling.github.io/Gram_Schmidt_Derivatives/ or here: https://github.com/wedeling/Gram_Schmidt_Derivatives. You'll also find a Python source code to compute the derivatives, and a Jupyter notebook with symbolic math to validate the expressions.

  • 1
    While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From Review – amWhy Nov 18 '21 at 01:13
  • Thanks, I wasn't aware LaTeX would just work right out of the box. I've modified my post. – Wouter Edeling Nov 18 '21 at 09:21
  • Wonderful, I arrived at the exact same formula at some point and it's a relief to see that it's correct. Your answer was really helpful, thank you! – Koto Nov 18 '21 at 21:02