3

Given matrices $A, B \in \mathbb{C}^{(n-s) \times n}$ and matrix $\Sigma \in \mathbb{C}^{n \times n}$, I want to solve the following equality-constrained minimization problem

$$\begin{array}{ll} \underset{X \in \mathbb{C}^{n \times n}}{\text{minimize}} & \| \Sigma - X \cdot X^H \|_F\\ \text{subject to} & A X = B\end{array}$$

where $X^H = X^*$ is the hermitian conjugate (transpose and complex conjugate) and $\| \cdot \|_F$ is the Frobenius norm.

First we note that $\| Y \|_F^2 = \operatorname{tr}(YY^H)$ so minimizing $$ \operatorname{trace}\left( \Sigma\Sigma^H - \Sigma X X^H - X X^H \Sigma^H + X X^H X X^H \right) $$ is equivalent to minimizing the original expression.

First I try to see if there is an analytic solution (closed expression) to this constrained optimization problem, and the natural thing to do is try Lagrange Multipliers Method: $$ f(X) = \operatorname{trace}( (\Sigma - XX^H)(\Sigma - XX^H)^H ) + \lambda (A X - B) $$ but then I encountered two problem:

  1. What is $\lambda$? It can't be a scalar because the second term is a matrix and the first is a scalar. Moreover, there is no matrix that gives a scalar by pre- or post-multiply the second term (the constrain). A possible solution is to write each of the $(n-s) \times n$ equations separately and assign to each a different $\lambda_{i,j}$ with $i=1,...,n-s$ and $j=1,...,n$ and sum them (this will give us $(n-s)n$ terms in $f(X)$). Another solution is to replace this terms with $\langle \lambda , AX - B \rangle$ when $\lambda \in \mathbb{C}^{(n-s)\times n}$.
  2. Since there are expressions of the form $X X^H$ which include $z \cdot \bar{z}$ the first term is not differentiable in the complex sense. This hampers Lagrange Multipliers and other gradient based algorithms.

See How to set up Lagrangian optimization with matrix constrains for discussion of question 1.

Am I wrong here? Or these two arguments indeed show that obtaining analytic expression via Largrange Multipliers is not feasible?

Another idea is to try use the (Moore-Penrose) pseudo-inverse of $A$ to write $X = A^+ B$ but this over-determines $X$ which couldn't be the right solution (since if determined uniquely by the constrain then there is no minimization problem). Note that since $B \in \mathbb{C}^{(n-s) \times n}$ there are more variables ($X_{i,j}$) than equations, so there are $n^2 - (n-s) \times n = sn$ degrees of freedom in $X$.

If there is no analytic solution, what is the algorithmic way to solve this constrained minimization problem? I want to be able to program it and check it with Python using packages such as NumPy and SciPy. (Remark: algorithms that use real gradients probably won't work here because of the term $XX^H$ which is not differentiable in the complex sense.) Numerical optimization algorithms will be fine too.

Related questions:

General resources on complex optimization:

  • The TensorLab manual ( https://www.tensorlab.net/doc/complexoptimization.html ) offers ways to handle complex optimization using cogradients $\frac{\partial f}{\partial z}$ and $\frac{\partial f}{\partial \bar{z}}$. The question if there is an optimization function in Python that supports using these complex cogradients. – Triceratops Jun 01 '20 at 14:20
  • Thanks, I carried over the real operator from the scalar product $\langle A , B \rangle = \operatorname{Re}(\operatorname{trace}(A^H B))$ of (2.1) in https://hal.inria.fr/hal-01422932v2/document . Since norm is always real, this Re is redundant. I edited the questions accordingly. This is the first time I encounter complex optimization and therefore I am not so fluent in setting and underlying notions. – Triceratops Jun 01 '20 at 15:39
  • 1
    The Lagrange multiplier is a matrix. Take a look at this. Also, take a look at the Frobenius inner product. – Rodrigo de Azevedo Jun 01 '20 at 15:41

2 Answers2

1

I am not a big fan of Lagrangian relaxation. I have never found it generic enough. I would rather go with a linear fusion approach.

Let's introduce more matrices to be able to express a more powerful problem.

$X_h$ approximate $X^H$

$X_i$ approximate $X^{-1}$ if exists, or some well behaved $X^{\dagger}$ pseudoinverse if it doesn't.

Now we can write $$\|X_i\Sigma - X_h\|_F$$

And regularization terms $\alpha\|AX-B\|$, $\beta\|X_i X - I\|$, $\gamma\|T X_h - X\|$

Where T does conjugate transpose on vectorisation.

Now there remains one un-linearity here : The $\beta$ term.

So a linear least squares will not be enough.

But we can do an iterated two-stage linear-least squares.

  1. First phase optimizes $X,X_h$,
  2. Second phase optimizes $X_i$

Using Kronecker products we can express "T" operation above with a matrix on the vectorization of $X_h$.

The linear fusion comes into play when we connect $X$ and $X_h$ through $\gamma$-regularization and $X$ and $X_i$ through $\beta$-regularization

mathreadler
  • 26,534
  • Thank you for your answer. I have two questions: (1) What is the linear fusion methos and how it is done? Reference will also be good. (2) In "iterated two-stage" do you mean that I repeat phases 1 and 2 several time (typically how many times?) until $X$, $X_h$ and $X_i$ converge? – Triceratops Jun 02 '20 at 08:37
  • @Triceratops You can sadly not read about linear fusion anywhere yet, because the inventor hasn't written a book about it yet. For the purpose of applying it to this question the Wikipedia page on "Kronecker product"s should suffice. Yes iteratively solving and updating linear least squares systems until they converge. – mathreadler Jun 02 '20 at 14:59
1

Use the pseudoinverse of $A$ to solve the constraint equation for $X$ $$\eqalign{ \def\k{\otimes} \def\h{\odot} \def\o{{\tt1}} \def\LR#1{\left(#1\right)} \def\BR#1{\left[#1\right]} \def\CR#1{\left\lbrace #1 \right\rbrace} \def\op#1{\operatorname{#1}} \def\vc#1{\op{vec}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\q{\quad} \def\qq{\qquad} \def\qif{\q\iff\q} \def\qiq{\q\implies\q} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} \def\rr#1{\color{red}{#1}} \def\gg#1{\color{lime}{#1}} \def\bb#1{\color{blue}{#1}} \def\RLR#1{\rr{\LR{#1}}} \def\BLR#1{\bb{\LR{#1}}} \def\GLR#1{\gg{\LR{#1}}} \def\S{\Sigma} \def\H{{\small\sf H}} \def\T{{\small\sf T}} AX &= B \qiq X&=A^{\bf+}B+\LR{I-A^{\bf+}A}U \\ &&\doteq X_0+PU\\ }$$ where the matrix $U$ is $\underline{\rm arbitrary}$ and $P$ is an orthoprojector into the nullspace of $A$.

For ease of typing, define the matrix variables $$\eqalign{ S &= \LR{XX^\H-\S} \qq\qq\qq\qq\qq\qq \\ Y &= \LR{S+S^\H} = \LR{2XX^\H-\S-\S^\H} \\ }$$

This yields an $\sf unconstrained$ minimization problem wrt $U$ $$\eqalign{ \phi &= \frob{S} \\ \phi^2 &= S:S^* \\ 2\phi\,d\phi &= S:d\LR{X^*X^\T} \;+\; conj \\ &= S:\LR{X^*dX^\T+dX^*X^\T} \;+\; conj \\ &= \LR{S^\T X^*}:dX + \LR{SX}:dX^* \;+\; conj \\ &= \LR{S^\T X^*+S^*X^*}:dX + \LR{SX+S^\H X}:dX^* \\ &= \LR{YX}^*:\LR{P\,dU} + \LR{YX}:\LR{P\,dU}^* \\ &= \LR{PYX}^*:dU + \LR{PYX}:dU^* \\ }$$ The gradients are complex conjugates of each another $$\eqalign{ \grad{\phi}{U^*} &= \fracLR{PYX}{2\phi} = \gradLR{\phi}{U}^* \\ }$$ Set the gradient to zero and solve for the optimal $U$ $$\small\eqalign{ &0 = PYX = PY\LR{X_0+PU} \\ &\LR{PYP}U = -PYX_0 \\ &U = -\LR{\rr{P}\gg{Y}\rr{P}}^{\bf+}\rr{P}\gg{Y}X_0 \\ &U = \LR{\RLR{A^{\bf+}A-I}\GLR{2XX^\H-\S-\S^\H} \RLR{A^{\bf+}A-I}}^{\bf+} \RLR{A^{\bf+}A-I}\GLR{2XX^\H-\S-\S^\H}\LR{A^{\bf+}B} \\ \\ }$$


In the above derivation, the colon product has the following properties $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ B^*\!:B &= \frob{B}^2 \qquad \{ {\sf Frobenius\;norm} \}\\ A:B &= B:A \;=\; B^T:A^T \\ \LR{PQ}:B &= P:\LR{BQ^T} \;=\; Q:\LR{P^TB} \\ }$$ and the phrase "$+\;conj$" indicates the complex conjugate of the initial expression, which was omitted to save horizontal space.

greg
  • 40,033