What's the Advantage of Difference of Convex Programming Compared to the Gradient Projection Method?

Question

Consider the following problem ($P_1$)

$$(P_1)\;\;\; \min_{\mathbf{x}\in\mathbb{R}^n}f_1(\mathbf{x})-f_2(\mathbf{x})\\ s.t. \mathbf{A}\mathbf{x}=\mathbf{b},\\ 0\le \mathbf{x}\le 1, $$where $f_1(\mathbf{x})$ and $f_2(\mathbf{x})$ are two continuously differentiable and convex functions.

According to Thomas Lipp, Stephen Boyd - Variations and Extensions of the Convex-Concave Procedure, $(P_1)$ can be solved by the DC (difference-of-convex) programming to obtain a stationary point. However, ($P_1$) can also be solved by the gradient projection method (GPM).

My question is: when solving $(P_1)$, is DC more efficient than GPM? What's the advantages of the DC, comparing with GPM?

Since your objective is not convex, there's no way to be sure that GPM will find a global minimum. — Brian Borchers, Jan 22 '18 at 14:46
Both the DC and the GPM can only to obtain a stationary point (i.e., locally optimal solution). — Dave, Jan 22 '18 at 15:06
Branch and bound and cutting plane methods are widely used to solve DC problems to global optimality. The Convex-Concave procedure mentioned by the OP is a heuristic that is often used for these problems. Perhaps the OP means to ask "What's the advantage of CCP over GPM for finding locally optimal solutions to DC problems." — Brian Borchers, Jan 22 '18 at 15:42
It's not easy or efficient to project onto the constraint set in this example, so the projected gradient method does not seem like a good fit. Projecting onto the constraint set in this example requires solving an optimization problem (using an iterative algorithm) each time we want to do it. — littleO, Mar 21 '20 at 20:52
@littleO, Since there is a simple (At least in closed form) projection to each constraint one could use methods as in Orthogonal Projection onto the Intersection of Convex Sets. I think It will still be an iterative procedure but more efficient than solving them at once. What do you think? — Royi, Mar 21 '20 at 21:18
@Royi To project onto the constraint set here we must solve the optimization problem: minimize $| x - \hat x |^2$ subject to $Ax = b, 0 \leq x \leq 1$. There's a whole world of algorithms we could choose to solve this problem, and I think the best choice depends on the specific characteristics of our problem. If the problem is sufficiently small then I'd guess an interior point method would be fastest. For large scale problems, we'd probably use an algorithm such as the one you mentioned in the link. (I think Dykstra's algorithm for two convex sets is equivalent to using Douglas-Rachford.) — littleO, Mar 21 '20 at 21:35
There is no difference between us. By the way, I'd be happy if you extend your post about the Consensus Trick. I'm not familiar with it. — Royi, Mar 21 '20 at 21:43

score 3 · Accepted Answer · answered Jan 23 '18 at 05:09

3

The CCP procedure can be applied to a DC programming problem in cases where the convex functions are non-smooth.

Gradient descent can't be applied to DC programming problems in cases where the convex functions are non-smooth because $f_{1}(x)-f_{2}(x)$ won't generally be smooth.

answered Jan 23 '18 at 05:09

Brian Borchers

11,308

I just want to add some comment if someone is interested. A clear example would be finding the format Torricelli point by the author le may nam. In this work a dc programming is applied to the non smooth sum of absolute value. – Tuong Nguyen Minh Oct 30 '24 at 14:04

score 2 · Answer 2 · edited Mar 21 '20 at 20:46

I know maybe it is too late to answer this question.

In my humble opinion, the Gradient Projection Method (GPM) can be view as a special case of DC programming to some extent.

Let us consider the following optimization problem: \begin{equation} \begin{array}{cl} {\min} & {f(x)} \\ {\text{s.t.}} & {x \in C} \end{array} \end{equation} where we did not necessarily need $f$ is convex. Obviously this kind of problem include OP's problem ($P_1$).

To solve this problem, we can use DC programming, for details we construct two convex function, i.e., \begin{equation} g(x) = \frac{\lambda}{2} \|x\|^2, \quad h(x) = \frac{\lambda}{2} \|x\|^2 - f(x). \end{equation} And $\lambda$ should be large enough to ensure that function $h$ is convex ($\lambda I \succeq \nabla^2 f(x)$).

Through DC programming, we know that \begin{equation} \begin{aligned} \bar{x} &= \arg\min_{x} \frac{\lambda}{2} \|x\|^2 - (x-x^k)^T \nabla h(x^k) \\ &= x^k - \frac{1}{\lambda} \nabla f(x) \\ x^{k+1} &= \operatorname{Proj}_{C} (\bar{x}^{k}) \end{aligned} \end{equation} This is actually the GPM. This DC decomposition is very common, one can refer to DC Programming and DCA - Theory, Algorithms and Applications.

What's the Advantage of Difference of Convex Programming Compared to the Gradient Projection Method?

2 Answers2