What is the Moore-Penrose pseudoinverse for scaled linear regression?

Question

The matrix equation for linear regression is: $$ \vec{y} = X\vec{\beta}+\vec{\epsilon} $$ The Least Square Error solution of this forms the normal equations:

$$ ({\bf{X}}^T \bf{X}) \vec{\beta}= {\bf{X}}^T \vec{y} $$

Using the Moore-Penrose pseudoinverse: $$ X^+ = ({\bf{X}}^T \bf{X})^{-1}{\bf{X}}^T $$ this can be written as: $$ \vec{\beta}= A^+ \vec{y} $$ Decomposing using SVD: $$ X=U\Sigma V^T$$ alledgely leads to: $$ X^+=V \Sigma^+U^T $$

where the pseudoinverse of $ \Sigma $ is just found by taking the inverse of all nonzero diagonal elements.

The point of doing things this way is that I will be doing linear regression n >> 1 times and $\vec{y}$ will vary but X (and therefore $X^+$ also) will be constant.

In particular I am interested in doing "scaled" linear regression: $$ \vec{y} = WX\vec{\beta}+\vec{\epsilon} $$ where W is a diagonal matrix (and in my case the diagonal elements are integer values) which change over n. The reason for this is that sometimes I have multiple measurements of $y_i$ for the same X.

I understand that I could solve this using the pseudoinverse of WX but this pseudoinverse would then have to be found for each n and that defeats the point.

In the case of scaled linear regression the normal equations are: $$ ({\bf{X}}^T \bf{W}^2 \bf{X}) \vec{\beta}= ({\bf{X}}^T \bf{W}^T) \vec{y} $$

If I substitute SVD into this equation what is then the equation for $ \vec{\beta} $?

I am hoping for some cheap operation so that: $$ \vec{\beta}= operation(W,X^+,\vec{y}) $$

As an example something like $$ \vec{\beta}= (WX^+)\vec{y} $$ would be ideal (this example is probable not correct though), since I would just find the pseudoinverse of X once (costly) and then multiply it with a diagonal matrix and a vector (cheap) for each n.

It seem implausible that there is a cheap procedure. By rescaling the linear equations by the entries in $W$, you can make the least-squares solution anything you want, basically. — Christopher A. Wong, Jun 18 '13 at 21:42

score 1 · Answer 1 · edited Apr 13 '17 at 12:20

An interpretive reformulation follows.

Initial solution

Start with a matrix $\mathbf{A}\in \mathbb{C}^{m\times n}$, and a sequence of measurements $\left\{ b_{k} \right\}_{k=1}^{m}$ such that $b\notin\mathcal{R}(\mathbf{A}^{*})$. What is the best solution vector $x\in\mathbb{C}^{n}$ for the linear system $$ \mathbf{A} x = b $$ with respect to the $2-$norm?

You resolved the matrix in terms of it's SVD components $$ \mathbf{A} = \mathbf{U} \, \Sigma \, \mathbf{V}^{*}, $$ and used them to fashion the Moore-Penrose pseudoinverse $$ \mathbf{A}^{\dagger} = \mathbf{V} \, \Sigma^{\dagger} \, \mathbf{U}^{*}, $$ to pose the least squares solution $$ x_{LS} = \mathbf{A}^{\dagger} b. $$

Weighting factors

Next, you want to stir in some weighting factors via the full rank matrix $\mathbf{W}\in\mathbb{C}^{m\times m}_{m}$. You ask how to solve $$ \mathbf{W}\mathbf{A} x = b $$ The effect of the weighting factors will change the solution. It does not effect existence or uniqueness. (For proofs post another question.)

Relabel the system matrix $$ \mathbf{\hat{A}} = \mathbf{W} \mathbf{A} $$ and follow the successful path from before: $$ \mathbf{\hat{A}}^{\dagger} = \mathbf{\hat{V}} \, \hat{\Sigma}^{\dagger} \, \mathbf{\hat{U}}^{*}, $$ to pose the least squares solution $$ x_{LS} = \mathbf{\hat{A}}^{\dagger} b. $$

This post may be of interest: multiplying a linear system by an invertible diagonal matrix

What is the Moore-Penrose pseudoinverse for scaled linear regression?

1 Answers1

Linked