Minimize $\|AXBd -c \|^2$, enforcing $X$ to be a diagonal block matrix

Question

Currently, I am minimizing the quadratic objective $\|\mathbf{A}\mathbf{X}\mathbf{B}\mathbf{d} -\mathbf{c} \|^2$ using CVX, as follows

echo on
cvx_begin
    variable xx(num_triangles*3,num_triangles*3)
    minimize( norm( A * xx * B * d - c ) )
cvx_end
echo off

However, $\mathbf{X}$ is a very large matrix (about $50,000 \times 50,000$, which is too big). Good news is that $\mathbf{X}$ is a block diagonal matrix (of $3 \times 3$ transformation matrices) and, therefore, it is highly sparse.

To give an example, for $\mathbf{X} \in \mathbb{R}^{9 \times 9}$ I would be looking for such matrix.

|  X1  X4  X7   0   0   0   0   0   0 |
|  X2  X5  X8   0   0   0   0   0   0 |
|  X3  X6  X9   0   0   0   0   0   0 |
|   0   0   0 X10 x13 x16   0   0   0 |
|   0   0   0 X11 x14 x17   0   0   0 |
|   0   0   0 X12 x15 x18   0   0   0 |
|   0   0   0   0   0   0 X19 x22 x25 |
|   0   0   0   0   0   0 X20 x23 x26 |
|   0   0   0   0   0   0 X21 x24 x27 |

so in fact I am only solving for $27$ variables, not $9 \times 9 = 81$. And this scales pretty badly.

But I don't know how I can enforce such structure when formulating the minimization problem. If I declare a normal square matrix of the size I need, the memory just explodes. Any ideas on how to reformulate the problem in a way that I actually solve for the number of unknowns? I currently look for solutions in MATLAB.

Using differentiation we find that if $\mathbf{X}$ minimizes the norm then for each $(i,j)$ corresponding to a valid position in $\mathbf{X}$ at least one of the following hold:
$$\left\langle B^Te_j,d\right\rangle=0\ {\left[A^Tc\right]}_i={\left[A^TAXBd\right]}_i$$

where $e_k$ is the $k$-th vector of the canonical basis and ${[v]}_k=\langle v,e_k\rangle$ is the $k$-th coordinate of $v$. This in principle could help limit the search for minimizing $\mathbf{X}$'s. I can expand on how I got this if you deem this reasonable. — Fimpellizzeri, May 17 '17 at 20:54
Tell me something about $\rm A$. Is it square? Fat? Thin? What about the rank? — Rodrigo de Azevedo, May 19 '17 at 13:20

Rodrigo de Azevedo · Answer 1 · 2017-05-19T17:52:04.443

We have a quadratic program in $\mathrm X \in \mathbb R^{3n \times 3n}$

$$\text{minimize} \quad \| \mathrm A \mathrm X \mathrm b - \mathrm c \|_2^2$$

where $\rm X$ is block diagonal

$$\mathrm X = \begin{bmatrix} \mathrm X_1 & & & \\ & \mathrm X_2 & & \\ & & \ddots & \\ & & & \mathrm X_n\end{bmatrix}$$

and each $\rm X_i$ block is $3 \times 3$. Let

$$\mathrm y := \mathrm X \mathrm b = \begin{bmatrix} \mathrm X_1 & & & \\ & \mathrm X_2 & & \\ & & \ddots & \\ & & & \mathrm X_n\end{bmatrix} \begin{bmatrix} \mathrm b_1\\ \mathrm b_2\\ \vdots \\ \mathrm b_n\end{bmatrix} = \begin{bmatrix} \mathrm X_1 \mathrm b_1\\ \mathrm X_2 \mathrm b_2\\ \vdots \\ \mathrm X_n \mathrm b_n\end{bmatrix} =: \begin{bmatrix} \mathrm y_1\\ \mathrm y_2\\ \vdots \\ \mathrm y_n\end{bmatrix}$$

Hence, we have a least-squares problem in $\mathrm y \in \mathbb R^{3n}$

$$\text{minimize} \quad \| \mathrm A \mathrm y - \mathrm c \|_2^2$$

A minimizer is a solution to the normal equations

$$\mathrm A^{\top} \mathrm A \mathrm y = \mathrm A^{\top} \mathrm c$$

Once a least-squares solution $\bar{\mathrm y}$ has been found, to find $\rm X$ we must solve $n$ underdetermined systems of $3$ linear equations in $3^2 = 9$ unknowns

$$\mathrm X_k \mathrm b_k = \bar{\mathrm y}_k$$

Vectorizing, we obtain

$$(\mathrm b_k^{\top} \otimes \mathrm I_3) \, \mbox{vec} (\mathrm X_k) = \bar{\mathrm y}_k$$

If $\color{blue}{\mathrm b_k \neq 0_3}$, then $3 \times 9$ matrix $\mathrm b_k^{\top} \otimes \mathrm I_3$ has full row rank. In that case, the least-norm solution is

$$\begin{array}{rl} \mbox{vec} (\bar{\mathrm X}_k) &= (\mathrm b_k^{\top} \otimes \mathrm I_n)^{\top} \left( (\mathrm b_k^{\top} \otimes \mathrm I_n) (\mathrm b_k^{\top} \otimes \mathrm I_n)^{\top} \right)^{-1} \bar{\mathrm y}_k\\ &= (\mathrm b_k \otimes \mathrm I_n) \left( (\mathrm b_k^{\top} \otimes \mathrm I_n) (\mathrm b_k \otimes \mathrm I_n) \right)^{-1} \bar{\mathrm y}_k\\ &= (\mathrm b_k \otimes \mathrm I_n) \left( \mathrm b_k^{\top} \mathrm b_k \otimes \mathrm I_n \right)^{-1} \bar{\mathrm y}_k\\ &= \left( \frac{\mathrm b_k}{\| \mathrm b_k \|_2^2} \otimes \mathrm I_n \right) \bar{\mathrm y}_k\\ &= \frac{\mathrm b_k}{\| \mathrm b_k \|_2^2} \otimes \bar{\mathrm y}_k\end{array}$$

Un-vectorizing, we obtain the rank-$1$ matrix

$$\boxed{\bar{\mathrm X}_{k} = \frac{\bar{\mathrm y}_k \mathrm b_k^{\top}}{\mathrm b_k^{\top} \mathrm b_k}}$$

Note that

$$\bar{\mathrm X}_k \mathrm b_k = \left( \frac{\bar{\mathrm y}_k \mathrm b_k^{\top}}{\mathrm b_k^{\top} \mathrm b_k} \right) \mathrm b_k = \bar{\mathrm y}_k \left( \frac{ \mathrm b_k^{\top} \mathrm b_k}{\mathrm b_k^{\top} \mathrm b_k} \right) = \bar{\mathrm y}_k$$

If $\color{blue}{\mathrm b_k = 0_3}$, then the linear system $(\mathrm b_k^{\top} \otimes \mathrm I_3) \, \mbox{vec} (\mathrm X_k) = \bar{\mathrm y}_k$ only has a solution if $\bar{\mathrm y}_k = 0_3$.

score 1 · Answer 2 · edited May 23 '17 at 12:39

Consider for simplicity the case where $\mathbf{X}$ is of the form

$$ \mathbf{X} = \left[ \begin{matrix} \mathbf{X}_{1,1} & \mathbf{0} \\ \mathbf{0} & \mathbf{X}_{2,2} \\ \end{matrix} \right]. $$

Writing $\mathbf{A}$ as block matrix

$$ \mathbf{A} = \left[\begin{matrix} \mathbf{A}_{1,1}& \mathbf{A}_{2,1} \\ \mathbf{A}_{1,2} & \mathbf{A}_{2,2} \\ \end{matrix} \right], $$

the product $\mathbf{AX}$ can be written as

$$ \mathbf{AX}=\left[ \begin{matrix} \mathbf{A}_{1,1}\mathbf{X}_{1,1} & \mathbf{A}_{2,1}\mathbf{X}_{2,2} \\ \mathbf{A}_{1,2}\mathbf{X}_{1,1} & \mathbf{A}_{2,2}\mathbf{X}_{2,2} \\ \end{matrix} \right]. $$

This approach generalizes to $\mathbf{X}$ having arbitrary number of diagonal matrix elements.

Your best bet is to declare a matrix variable corresponding to each $\mathbf{X}_{i,i}$ and perform the multiplication $\mathbf{AX}$ manually, as shown above (followed by right multiplication with $\mathbf{Bd}$). This approach will eliminate the need for declaring a full size matrix variable for $\mathbf{X}$, as well as saving computations corresponding to multiplication with zeros. However, as the matrix sizes you are considering are indeed very large, this approach may also fail.

A more convenient approach would be to exploit CVX features related to block-diagonal and/or sparse matrices. However, I am not familiar with the program. You should also try asking Computational Science SE or Stack Overflow SE.

Minimize $\|AXBd -c \|^2$, enforcing $X$ to be a diagonal block matrix

2 Answers2

Linked