10

The general method of Projection on Convex Sets (POCS) can be used to find a point in the intersection of a number of convex sets i.e.

$$ \text{find } x \in \mathbb{R}^N \text{ s.t. } x \in \bigcap_i C_i $$

This method can find any feasible point in the intersection of the convex sets. Now my question is: Is there a similar method that can find a point $x^*$ that has minimum norm instead i.e. solve

$$ x^* = \arg \min_{x \in \mathbb{R}^N} \Vert x\Vert \text{ s.t. } x \in \bigcap_i C_i$$

or more generally find the projection of a point $a \in \mathbb{R}^N$ onto the intersection of the convex sets i.e. solve

$$ x^* = \arg \min_{x \in \mathbb{R}^N} \Vert x - a\Vert \text{ s.t. } x \in \bigcap_i C_i$$ for some $a \in \mathbb{R}^N$?

Edit

The problem we are trying to solve is a 3D tomography reconstruction problem, and thus the variable $x$ takes gigabytes of RAM. There is already a POCS algorithm (A variant of Algebraic Reconstruction Technique (ART)) that finds a feasible point. So is there a way to use it as a black-box or adapt it to find a minimum-norm solution instead?

Royi
  • 10,050

4 Answers4

5

Projection on Convex Sets (POCS) / Alternating Projections does exactly what you want in case your sets $ \left\{ \mathcal{C}_{i} \right\}_{i = 1}^{m} $ are sub spaces.

Namely if $ \mathcal{C} = \bigcap_{i}^{m} \mathcal{C}_{i} $ where $ {\mathcal{C}}_{i} $ is a sub space and the projection to the set is given by:

$$ \mathcal{P}_{\mathcal{C}} \left( y \right) = \arg \min_{x \in \mathcal{C}} \frac{1}{2} {\left\| x - y \right\|}_{2}^{2} $$

Then:

$$ \lim_{n \to \infty} {\left( \mathcal{P}_{\mathcal{C}_{1}} \circ \mathcal{P}_{\mathcal{C}_{2}} \circ \cdots \circ \mathcal{P}_{\mathcal{C}_{m}} \right)}^{n} \left( y \right) = \mathcal{P}_{\mathcal{C}} \left( y \right) $$

Where $ \mathcal{P}_{\mathcal{C}_{i}} \left( y \right) = \arg \min_{x \in \mathcal{C}_{i}} \frac{1}{2} {\left\| x - y \right\|}_{2}^{2} $.

In case any of the sets isn't a sub space but any convex set you need to use Dykstra's Projection Algorithm.

In the paper Ryan J. Tibshirani - Dykstra’s Algorithm, ADMM, and Coordinate Descent: Connections, Insights, and Extensions you may find extension of the algorithm for $ m \geq 2 $ number of sets:

enter image description here

I wrote a MATLAB Code which is accessible in my StackExchange Mathematics Q1492095 GitHub Repository.

I implemented both methods and a simple test case which shows when the Alternating Projections Method doesn't converge to the Orthogonal Projection on of the intersection of the sets.

I also implemented 2 more methods:

  1. Quadratic form of the Hybrid Steepest Descent (See Quadratic Optimization of Fixed Points of Non Expensive Mappings in Hilbert Space). I implemented 2 variations of this.
  2. Consensus ADMM method which is related to the Dykstra algorithm from above.

While the Dykstra and ADMM methods were the fastest they also require much more memory than the Hybrid Steepest Descent method. So given one can afford the memory consumption the ADMM / Dykstra methods are to be chosen. In case memory is a constraint the Steepest Descent method is the way to go.

Royi
  • 10,050
3

This can be done fairly easily using proximal algorithms. Let $\delta_i$ be the indicator function of $C_i$: \begin{equation} \delta_i(x) = \begin{cases} 0 & \text{if } x \in C_i, \\ \infty & \text{otherwise.} \end{cases} \end{equation} Your optimization problem can be written as \begin{equation} \text{minimize} \quad \| x - a \| + \sum_i \delta_i(x). \end{equation} The objective function is a sum of functions that have easy proximal operators. Thus, you can use the Douglas-Rachford method (together with the consensus trick) to solve this optimization problem.

The Douglas-Rachford method is an iterative method for minimizing the sum of two convex functions, each of which has an easy proximal operator. Since we have a sum of more than two functions, it may seem that Douglas-Rachford does not apply. However, we can get around this by using the consensus trick. We reformulate our problem as

\begin{align*} \text{minimize} & \quad \underbrace{\|x_0 - a \| + \sum_i \delta_i(x_i)}_{f(x_0,x_1,\ldots,x_m)} + \underbrace{\delta_S(x_0,x_1,\ldots,x_m)}_{g(x_0,x_1,\ldots,x_m)} \end{align*} where \begin{equation} S = \{(x_0,x_1,\ldots,x_m) \mid x_0 = x_1 = \cdots = x_m \} \end{equation} and $\delta_S$ is the indicator function of $S$. The variables in this reformulated problem are the vectors $x_0,x_1,\ldots,x_m$. The indicator function $\delta_S$ is being used to enforce the constraint that all these vectors should be equal. We are now minimizing a sum of two convex functions, $f$ and $g$, which have easy prox-operators because $f$ is a separable sum of easy functions and $g$ is the indicator function of a set that we can project onto easily. So we are now able to apply the Douglas-Rachford method.

The Douglas-Rachford iteration is just three lines, and the code for this problem could probably be written in about a page of Matlab (unless projecting onto the sets $C_i$ is very complicated).

littleO
  • 54,048
  • Thanks for your answer. That will definitely work in general, but it's not what I am looking for. Our problem is a 3D tomography reconstruction, and we there is an algorithm (similar to ART) that finds a feasible solution in the intersection of convex sets (essentially halfplanes). We are bounded by memory, since x takes GBs of RAM. So do you know of a general way to adapt the POCS algorithm to find a minimum norm solution? – Mohamed Aly Oct 22 '15 at 10:15
  • @MohamedAly Why do you say this isn't what you're looking for? Proximal algorithms are intended to solve very large scale problems. I think the algorithm I'm describing is pretty much the closest thing to POCS that will project onto an intersection of convex sets. The proximal operator of $\delta_i$ is the projection operator onto $C_i$, so in this algorithm the main work is to project onto the sets $C_i$ repeatedly, as in the POCS algorithm. – littleO Oct 22 '15 at 10:33
  • 1
    @MohamedAly Oh, I just realized that the problem is that you can't have all these copies of $x$, since $x$ is so huge. Is that your concern? I'll think about that. – littleO Oct 22 '15 at 10:38
  • precisely, $x$ is very huge and we can't have many copies of it, especially that also the number of convex sets (halfplanes) is in the millions (one per every line projection). – Mohamed Aly Oct 22 '15 at 10:43
  • @MohamedAly Have you considered doing CT reconstruction using a different method, not based on POCS, such as the Chambolle-Pock algorithm as described in this paper? I'd guess that's a good way to go. – littleO Oct 22 '15 at 10:47
  • I will look into that paper. Thanks :) – Mohamed Aly Oct 22 '15 at 10:54
  • The asker should accept this solution as it amply answers the question. Also, for the so-called "consensus trick", I propose @littleO cite Combettes, P.L., Pesquet, J.C., "A proximal decomposition method for solving convex variational inverse problems.", Inv. Prob., 24(6):065014, (2008); and Raguet, H., Fadili, J.M., Peyré, G., "Generalized Forward–Backward splitting.", SIAM Im. Sciences, 6(3):1199–1226, (2013). – dohmatob Oct 24 '15 at 09:30
  • @dohmatob The solution answers the question in general not this specific instance where there are memory/storage limitations (as explained in the edited question). – Mohamed Aly Oct 25 '15 at 08:16
  • @littleO I looked at the paper. It does solve the problem but not based on POCS. The issue with these approaches is that they update the volume after processing all the projection images, and not block-based, like SART for example, that updates after processing each projection image, which makes it converge faster. I will look more into that to see if it can be converted in that way. – Mohamed Aly Oct 25 '15 at 08:24
  • @MohamedAly So you're concerned that the Chambolle-Pock algorithm will take too many iterations to converge? Based on my experience with similar problems I think it will converge reasonably fast if you select good step sizes. I don't think you need to modify the Chambolle-Pock algorithm -- ad hoc modifications may potentially slow it down or cause it to not converge, and if you just implement it correctly it might work rather well. – littleO Oct 25 '15 at 08:38
  • @littleO I found this paper which might be relevant, but still looking into it. – Mohamed Aly Oct 26 '15 at 13:42
  • @MohamedAly Out of curiosity, is this for a research project in an academic setting? Or are you in an industry setting? – littleO Oct 26 '15 at 20:57
  • @littleO In academia. – Mohamed Aly Oct 27 '15 at 07:38
  • @littleO: Sorry to bring this topic again (if you prefer I can initiate another thread). I do not know yet how you can obtain the proximal operators for both functions you have proposed. Can you please share your answers? Many thanks in advance. For simplicity, we can take $C_i = {x_i : y^T x_i \leq b }$ or any simple $C_i$ that you would prefer for the example. – user550103 Sep 14 '18 at 08:53
  • @user550103 The prox-operator of $\delta_S$ projects onto $S$ as follows: if the input is $(x_0,\ldots,x_m)$ then the output is $(\bar x, \bar x,\ldots,\bar x)$, where $\bar x$ is the mean of the vectors $x_0,\ldots,x_m$. If $C_i$ is a half-space, then the prox-op of $\delta_i$ projects onto this half-space, as described here for example. If $(u_0,\ldots,u_m) = \text{prox}f(x_0,\ldots,x_m)$ then the separable sum rule for prox-operators tells us that $u_i = \text{prox}{\delta_i}(x_i)$ (for $i = 1,\ldots,m$). – littleO Sep 15 '18 at 08:31
  • @user550103 Boyd's booklet on Proximal Algorithms is a good place to learn more about this stuff, by the way. – littleO Sep 15 '18 at 08:34
  • @littleO Thank you for your reply and sharing your answers. I will think about your answers and also read the suggested reference. – user550103 Sep 15 '18 at 12:25
  • 1
    I implemented the Consensus ADMM in my answer. – Royi Mar 25 '20 at 20:39
1

I found an algorithm called Hybrid Steepest Descent Method (Explained in detail in the paper The Hybrid Steepest Descent Method for the Variational Inequality Problem Over the Intersection of Fixed Point Sets of Non Expansive Mappings) that can solve problems like

$$x^* = \arg \min_x f(x) \text{ s.t. } x \in \cap_i C_i$$

by iterating

$$ \begin{align} y^t & = (1-\beta)x^{t}+\beta\sum_{i}w_{i}P_{C_{i}}(x^{t}) \\ x^{t+1} & = y^{t}-\lambda_{t+1}\mu \nabla f(y^t) \end{align} $$

for a small enough $\mu>0$, $\lambda_t = 1/t $, $0<\beta\le 2$, $P_{C_i}$ is the projection onto set $C_i$, $w_i \in (0,1]$ s.t. $\sum_i w_i = 1$, and $\nabla f$ is the gradient of $f$.

So to solve the problem

$$x^* = \arg \min_x \Vert x-a \Vert^2 \text{ s.t. } x \in \cap_i C_i $$ we simply need to substitute $$f(x)=\frac{1}{2} \Vert x-a\Vert^2 \rightarrow \nabla f(x)=x-a$$

Royi
  • 10,050
  • You may use the Quadratic Form of this methods. It will be simpler. Pay attention that those methods are slower to converge than what I posted above. Though indeed they require less memory (Roughly the number of Sets to project to). – Royi Mar 21 '20 at 13:45
  • I implemented the Quadratic Form (As Orthogonal Projection can be formed as a Quadratic Function to optimize on the Set) in my answer. I actually implemented 2 variations. The simpler one is much faster. – Royi Mar 21 '20 at 14:03
  • Do you remember where did you get the exact formula you wrote? Which paper was it? – Royi Dec 28 '20 at 20:52
  • Unfortunately I don't. I haven't worked on that problem for over 4 years now :( – Mohamed Aly Dec 28 '20 at 23:09
1

ADMM Consensus Trick and Orthogonal Projection onto an Intersection of Convex Sets

The ADMM Consensus Optimization

Given a problem in the form:

$$\begin{aligned} \arg \min_{ \boldsymbol{x} } \quad & \sum_{i}^{n} {f}_{i} \left( \boldsymbol{x} \right) \\ \end{aligned}$$

By invoking auxiliary variables one could rewrite it as:

$$\begin{aligned} \arg \min_{ \boldsymbol{x} } \quad & \sum_{i}^{n} {f}_{i} \left( \boldsymbol{x}_{i} \right) \\ \text{subject to} \quad & \boldsymbol{x}_{i} = \boldsymbol{z}, \; i = 1, 2, \ldots, n \\ \end{aligned}$$

Namely, it is a separable form with equality constraint on each variable.
In matrix form it can be written as:

$$\begin{aligned} \arg \min_{ \boldsymbol{x} } \quad & \sum_{i}^{n} {f}_{i} \left( \boldsymbol{x}_{i} \right) \\ \text{subject to} \quad & \boldsymbol{u} := \begin{bmatrix} \boldsymbol{x}_{1} \\ \boldsymbol{x}_{2} \\ \vdots \\ \boldsymbol{x}_{n} \end{bmatrix} = \begin{bmatrix} I \\ I \\ \vdots \\ I \end{bmatrix} \boldsymbol{z} \end{aligned}$$

By defining $ {E}_{i} $ to be the matrix such that $ \boldsymbol{x}_{i} = {E}_{i} \boldsymbol{u} $, namely the selector of the appropriate sub vector form $ \boldsymbol{u} $ we can write the problem in the ADMM form (The Scaled ADMM form):

$$\begin{aligned} \boldsymbol{u}^{k + 1} & = \arg \min_{ \boldsymbol{u} } \sum_{i}^{n} {f}_{i} \left( {E}_{i} \boldsymbol{u} \right) + \frac{\rho}{2} {\left\| {E}_{i} \boldsymbol{u} - \boldsymbol{z}^{k} + \boldsymbol{\lambda}^{k} \right\|}_{2}^{2} \\ \boldsymbol{z}^{k + 1} & = \arg \min_{ \boldsymbol{z} } \frac{\rho}{2} {\left\| \boldsymbol{u}^{k + 1} - \begin{bmatrix} I \\ I \\ \vdots \\ I \end{bmatrix} \boldsymbol{z} + \boldsymbol{\lambda}^{k} \right\|}_{2}^{2} \\ \boldsymbol{\lambda}^{k + 1} & = \boldsymbol{\lambda}^{k} + \boldsymbol{u}^{k + 1} - \begin{bmatrix} I \\ I \\ \vdots \\ I \end{bmatrix} \boldsymbol{z}^{k + 1} \\ \end{aligned}$$

Since the form is block separable it can be written in an element form:

$$\begin{aligned} \boldsymbol{x}_{i}^{k + 1} & = \arg \min_{ \boldsymbol{x}_{i} } {f}_{i} \left( \boldsymbol{x}_{i} \right) + \frac{\rho}{2} {\left\| \boldsymbol{x}_{i} - \boldsymbol{z}^{k} + \boldsymbol{\lambda}_{i}^{k} \right\|}_{2}^{2}, & i = 1, 2, \ldots, n \\ \boldsymbol{z}^{k + 1} & = \arg \min_{ \boldsymbol{z} } \frac{\rho}{2} \sum_{i = 1}^{n} {\left\| \boldsymbol{x}_{i}^{k + 1} - \boldsymbol{z} + \boldsymbol{\lambda}_{i}^{k} \right\|}_{2}^{2} \\ \boldsymbol{\lambda}_{i}^{k + 1} & = \boldsymbol{\lambda}_{i}^{k} + \boldsymbol{x}_{i}^{k + 1} - \boldsymbol{z}^{k + 1}, & i = 1, 2, \ldots, n \\ \end{aligned}$$

Remark: Pay attention that $ \boldsymbol{z}_{k + 1} = \frac{1}{n} \sum_{i = 1}^{n} \boldsymbol{x}_{i}^{k + 1} + \boldsymbol{\lambda}_{i}^{k} $. Namely the mean value of the set $ { \left\{ \boldsymbol{x}_{i}^{k + 1} + \boldsymbol{\lambda}_{i}^{k} \right\} }_{i = 1}^{n} $.

The Proximal Operator is given by $ \operatorname{Prox}_{\lambda f \left( \cdot \right)} \left( \boldsymbol{y} \right) = \arg \min_{\boldsymbol{x}} \frac{1}{2} {\left\| \boldsymbol{x} - \boldsymbol{y} \right\|}_{2}^{2} + \lambda f \left( \boldsymbol{x} \right) $ and simplifying the optimization for $ \boldsymbol{z}_{k + 1} $ one can write the above as:

$$\begin{aligned} \boldsymbol{x}_{i}^{k + 1} & = \operatorname{Prox}_{\frac{1}{\rho} {f}_{i} \left( \cdot \right)} \left( \boldsymbol{z}^{k} - \boldsymbol{\lambda}_{i}^{k} \right), & i = 1, 2, \ldots, n \\ \boldsymbol{z}^{k + 1} & = \frac{1}{n} \sum_{i = 1}^{n} \boldsymbol{x}_{i}^{k + 1} + \boldsymbol{\lambda}_{i}^{k} \\ \boldsymbol{\lambda}_{i}^{k + 1} & = \boldsymbol{\lambda}_{i}^{k} + \boldsymbol{x}_{i}^{k + 1} - \boldsymbol{z}^{k + 1}, & i = 1, 2, \ldots, n \\ \end{aligned}$$

Orthogonal Projection onto an Intersection of Convex Sets

The problem is given by:

$$\begin{aligned} \arg \min_{ \boldsymbol{x} } \quad & \frac{1}{2} {\left\| \boldsymbol{x} - \boldsymbol{y} \right\|}_{2}^{2} \\ \text{subject to} \quad & \boldsymbol{x} \in \bigcap_i \mathcal{C}_{i}, \; i = 1, 2, \ldots n \\ \end{aligned}$$

Namely we're looking for orthogonal projection of $ \boldsymbol{y} $ onto the intersection of the sets $ {\left\{ \mathcal{C}_{i} \right\}}_{i = 1}^{n} $.

The problem could be rewritten as:

$$\begin{aligned} \arg \min_{ \boldsymbol{x} } \quad & \frac{1}{2} {\left\| \boldsymbol{x} - \boldsymbol{y} \right\|}_{2}^{2} + \sum_{i}^{n} {\delta}_{\mathcal{C}_{i}} \left( \boldsymbol{x} \right) \end{aligned}$$

Where $ {\delta}_{\mathcal{C}_{i}} (x) = \begin{cases} 0 & x \in \mathcal{C}_{i} \\ \infty & x \notin \mathcal{C}_{i} \end{cases} $.

So we can use the ADMM Consensus Optimization by setting $ {f}_{1} \left( \boldsymbol{x} \right) = \frac{1}{2} {\left\| \boldsymbol{x} - \boldsymbol{y} \right\|}_{2}^{2} $ and $ {f}_{i} \left( \boldsymbol{x} \right) = {\delta}_{\mathcal{C}_{i - 1}} \left( \boldsymbol{x} \right), \; i = 2, 3, \ldots, n + 1 $.

In order to solve it we need the Proximal Operator of each function. For $ {f}_{1} \left( \cdot \right) $ it is given by:

$$ \operatorname{Prox}_{ \frac{1}{\rho} {f}_{1} \left( \cdot \right)} \left( \boldsymbol{v} \right) = \arg \min_{ \boldsymbol{x} } \frac{1}{2} {\left\| \boldsymbol{x} - \boldsymbol{y} \right\|}_{2}^{2} + \frac{\rho}{2} {\left\| \boldsymbol{x} - \boldsymbol{v} \right\|}_{2}^{2} = \frac{\rho \boldsymbol{v} + \boldsymbol{y}}{1 + \rho} $$

For the rest the Proximal Operator is the orthogonal projection:

$$ \operatorname{Prox}_{ \frac{1}{\rho} {f}_{i} \left( \cdot \right)} \left( \boldsymbol{v} \right) = \operatorname{Proj}_{ \mathcal{C}_{i - 1} } \left( \boldsymbol{v} \right), \; i = 2, 3, \ldots, n + 1 $$

Royi
  • 10,050