Trouble understanding the Gauss-Newton method to update a matrix.

Question

I want to find the P matrix that minimizes the function $f(P) = r(P)^Tr(P)$ with $r(P) = PX - x$ using the Gauss-Newton method.

with $X$ being a constant $m \times k$ matrix, $x$ being a constant $n \times k$ matrix, and $P$ being an $n \times m$ matrix.

I got the Jacobian matrix using the method described in here, $J = I_n \otimes X$.

Using the formula on this page, the Gauss-Newton equation should look like:

$$ P_{n+1} = P_n - (J^TJ)^{-1}J^Tr(P_n) $$

using properties of the Kronecker product from here

$$ (J^TJ)^{-1} = ((I_n \otimes X)^T(I_n \otimes X))^{-1} = (I_n \otimes X^TX)^{-1} = I_n \otimes (X^TX)^{-1} $$

and working it out on paper it seemed that: $$ J^Tr(P_n) = (I_n \otimes X)^T r(P_n) = r(P_n) \otimes X^T $$

Combining the two parts we get $$ (I_n \otimes (X^TX)^{-1})(r(P_n) \otimes X^T) = r(P_n) \otimes (X^TX)^{-1}X^T$ $$

$r(P_n)$ is an $n\times k$ matrix and $(X^TX)^{-1}X^T$ is an $ m \times k $ matrix so doing the kron product to them produces a $nm \times kk$ matrix.

Which is so close.

If it was $r(P_n)((X^TX)^{-1}X^T)^T$ instead of $r(P_n) \otimes (X^TX)^{-1}X^T$ it would produce a $n \times m $ matrix which I need for Gauss-Newton method equation from the top to work.

Can someone let me know where I went wrong?

score 1 · Accepted Answer · answered Nov 18 '21 at 17:47

Write the objective function $\phi(\mathbf{P})= \mathbf{R}:\mathbf{R}$ where $\mathbf{R}=\mathbf{P}\mathbf{X}-\mathbf{x}$.

Taking a differential approach, it holds $d\phi= 2\mathbf{R}:(d\mathbf{P})\mathbf{X}$ from which we deduce the gradient $$ \frac{\partial \phi}{\partial \mathbf{P}} = 2\mathbf{R}\mathbf{X}^T \rightarrow \mathbf{g} = \frac{\partial \phi}{\partial \mathbf{p}} = 2 \mathrm{vec}(\mathbf{R}\mathbf{X}^T) = 2 \mathbf{J}^T \mathbf{r} $$

where $\mathbf{p}=\mathrm{vec}(\mathbf{P})$ and $\mathbf{r} =\mathrm{vec}(\mathbf{R})$ where the following relation holds $d\mathbf{r} = \mathbf{J} d\mathbf{p} $ with the Jacobian matrix $$ \mathbf{J} =[\mathbf{X}^T \otimes \mathbf{I}_n] $$

The Hessian is found using $d\mathbf{g} = 2 \mathbf{J}^T d\mathbf{r} = 2 \mathbf{J}^T \mathbf{J} d\mathbf{p} = \mathbf{H} d\mathbf{p} $

The GN iteration writes $$ \mathbf{p}_{n+1} = \mathbf{p}_{n} - \mathbf{H}^{-1}(\mathbf{p}_{n}) \mathbf{g}(\mathbf{p}_{n}) = \mathbf{p}_{n} - (\mathbf{J}^T \mathbf{J})^{-1} \mathbf{J}^T \mathbf{r}(\mathbf{p}_{n}) $$

Is g transposed in the final equation? Since H is an mn x mn matrix, and p_n, g are both mnx1 matrices, the update term in the final equation is: (mn x mn)(mn x 1) (mn x 1)(mn x 1) which doesn't work, but if I use g^T I get (mn x mn)(mn x 1) (1 x mn)(mn x 1) which gives me an (mn x 1) vector which is what I want. — Kyle, Nov 19 '21 at 02:45
I think r is evaluated at p_n. so r(p_n) is an nk x 1 matrix. and g is also evaluated at p_n so g(p_n) is an nm x 1 matrix. How do I evaluated H^-1 at p_n? — Kyle, Nov 19 '21 at 03:24
I understand, H^-1 doesn't rely on p_n so it doesn't do anything, H^-1(p_n) = H^-1. — Kyle, Nov 19 '21 at 03:29

Trouble understanding the Gauss-Newton method to update a matrix.

1 Answers1