How to prove an expression of log-sum-exp type is convex in DCP?

Question

In Disciplined Convex Programming (DCP) the user should give a certificate to the program that the function (sometimes multivariate) is convex. The user should use predefined set of functions like

$x^2/y$
$(x_1 \ldots x_k)^{1/k}$
$\log (e^{x_1} + \ldots + e^{x_k})$
$-\log \det X $
the list a bit is longer than that, but not "extensive"
and their superpositions.

However, despite appealing tree-like structure, DCP rules are designed not for the user's convenience, but rather because they are needed to construct corresponding barriers for interior-point optimisation method. There are no general-purpose barrier constructors for arbitrary convex functions, and it couldn't be expected that every convex function can be expressed through the rules. DCP is a good alternative to composing the barriers by hands and doing manually cumbersome reformulations of the initial optimisation program. Good reference on the topic is Michael Grant's PhD Dissertation. He also mentioned a useful FAQ for MATLAB users but also applicable for python.

Consider as an example multivariate function $$ f(x_1, \ldots, x_d) = \dfrac{1}{1 - e^{x_1}} \cdot \dfrac{1}{1 - e^{x_2}} \cdots \dfrac{1}{1 - e^{x_d}} - 1 $$ with domain $ x_i < 0, i = \overline{1,d} $. I want to express multivariate convex function $$ \varphi(x_1, \ldots, x_d) := \log f(x_1, \ldots, x_d) $$ using DCP set of rules.

P.S. A more general question is how to prove convexity of the expressions in DCP in more general cases. Maybe there is some tutorial or "search algorithm" about this. I am not aware of any references of the kind, and Michael Grant is neither.

P.P.S. I added the tag python because my main concern is with the set of cvxpy disciplined convex functions which may not necessarily coincide with the set above. MATLAB also has similar system called cvx, the rules may slightly differ.

UPD: I edited the question because initially it was formulated for particular case $ x_1 = \ldots = x_d = x $. Current formulation is more general.

I'm afraid I don't understand what you're asking. It seems like you're assuming that every convex function you can think of can somehow be expressed in terms of DCP rules. That is simply not the case! Given that the predefined set of functions is finite, there are always going to be convex functions that cannot be "reached" by applications of the DCP composition rules. — Michael Grant, Jul 22 '17 at 14:10
For your specific case $f(x)=log(h(e^x)-1)$: is that even convex at all? That is to say, have you proven it to be convex by first principles? Unfortunately even if you have proven so, it cannot be used in a DCP system like CVX or cvxpy. There is no way to force it. — Michael Grant, Jul 22 '17 at 14:11
You are right. I was thinking a while about the problem (after scrolling your PhD thesis) and realised the purpose of DCP is to construct automatically barriers for interior point methods, which cannot be constructed automatically for all types of convex programs. Concerning the convexity, this is a log-exp transform of formal power series with nonnegative coefficients, hence convex. I was playing a while with log(det(X)) substitution because this gives rise to nontrivial convexity conclusions, but didn't manage yet. However, some tools to deduce DCP-convexity (if any) might be helpful. — Sergey Dovgal, Jul 22 '17 at 14:20
"Concerning the convexity, this is a log-exp transform of formal power series with nonnegative coefficients, hence convex." I've not seen this general principle before. Can you show me a proof? — Michael Grant, Jul 22 '17 at 14:21
"the purpose of DCP is to construct automatically barriers for interior point methods, which cannot be constructed automatically for all types of convex programs." Yes---or more generally, to automatically construct a set of transformations that transform a problem into a form solvable by the underlying solvers. I explain this idea a bit more in my CVX FAQ; start at the section "Yes, I am sure my model is convex!" — Michael Grant, Jul 22 '17 at 14:23
I assume that second derivative of $\log(a_1 e^{x} + \ldots + a_m e^{mx}), a_i \geq 0$ is nonnegative (since log-sum-exp is an "elementary" function with established convexity). Then I pass to the limit when $ m \to \infty $, the second derivative remains nonnegative. Since it is nonnegative over the whole segment of convergence, it is convex. There are more beautiful ways to prove this, in fact, second derivative is equal to the variance of Boltzmann-distributed variable. — Sergey Dovgal, Jul 22 '17 at 14:24
OK, I see what you're saying. Yes, if the $a_i$ values are nonnegative, then $\log\sum_i a_ie^{b_ix}$ is convex. — Michael Grant, Jul 22 '17 at 14:24
Thank you for the FAQ, however I am still not aware of any kind of "tutorial" where people express the problems in terms of DCP using elementary constructions. Maybe this is possible but non-obvious. — Sergey Dovgal, Jul 22 '17 at 14:27
I'm not aware of such a tutorial either. I'm really not sure how much it would help. I frankly don't think that going to such extended lengths to arrive at a DCP derivation is a fruitful approach for most people. I would suggest that in most cases where one is tempted to do it, it's because the problem isn't actually convex or solvable by CVX/cvxpy, so it would be in vain. — Michael Grant, Jul 22 '17 at 14:53
I edited the question to clarify the motivation behind DCP rules. — Sergey Dovgal, Jul 23 '17 at 16:07

Sergey Dovgal · Accepted Answer · 2017-07-25T19:10:51.957

Let me first show that $ f(x_1, \ldots, x_d) $ is convex. Indeed, if we represent the function as formal power series centered at $ (x_1, \ldots, x_d) = (0, \ldots, 0) $ then we will obtain something of a shape $$ \log \sum_{(i_1, \ldots, i_d)} a_{i_1 \ldots i_d} e^{i_1 x_1 + \ldots + i_d x_d} $$ with nonnegative coefficients $ a_{i_1, \ldots, i_d} $, which is log-sum-exp and this is a "table" convex function (but with exponentially many coefficients and we don't want this).

I assume that all convex functions are only involved in right-hand side of inequalities

CONCAVE >= CONVEX,

this will be used while introducing new slack variables and inequalities.

After variable change $ \dfrac{1}{1 - e^{x_i}} = 1 + e^{s_i} $ which can be expressed using slack inequalities $$ s_i \geq x_i + \log( 1 + e^{s_i}), \quad i = \overline{1, d} $$ we can reduce the problem to give DCP rules for constructing $$ h(s_1, \ldots, s_d) = \log \Big( (1+e^{s_1})\ldots (1 + e^{s_d}) - 1 \Big) \enspace . $$

The problem can be solved by introducing $ O(d^2) $ slack variables and inequalities. $$ \begin{cases} h &\geq \log(e^{p_{1,1}} + e^{p_{2,1}} + \ldots + e^{p_{d,1}} ),\\ p_{k,j} &\geq \log(e^{q_{k,j}} + e^{p_{k,j}}), \quad k=\overline{1,d},\ j \leq d-k+1, \\ q_{1,j} &= s_j, \quad j=\overline{1,d} ,\\ q_{k+1,j} &= p_{k,j} + q_{k,j+1} , \quad k = \overline{1,d-1},\ j \leq d-k+1 \end{cases} $$

The expressions $p_{1,1}, p_{2,1}, \ldots, p_{d,1}$ play the role of symmetric polynomials $$ \begin{cases} \sigma_1 := e^{s_1} + e^{s_2} + \ldots + e^{s_d},\\ \sigma_2 := e^{s_1+s_2} + e^{s_1+s_3} + \ldots + e^{s_{d-1} + s_d},\\ \ldots ,\\ \sigma_d := e^{s_1+s_2+\ldots +s_d} \end{cases} $$ They appear after expanding the brackets in $ h(s_1, \ldots, s_d) $: $$ h(s_1, \ldots, s_d) = \log(\sigma_1 + \sigma_2 + \ldots + \sigma_d) $$ The idea is to use quadratic algorithm for computing symmetric polynomials (instead of exponential time trivial explicit writing of summands). We maintain one 2D-array of parts of symmetric polynomials (upper-triangular matrix): $$ \begin{array}{rl} p_{1,j} &= [ x_1,\ x_2,\ \ldots,\ x_d ],\\ p_{2,j} &= [x_{d-1} \cdot x_d,\ x_{d-2} \cdot (x_{d-1} +x_d),\ \ldots,\ x_1 \cdot (x_2 + \ldots + x_d)], \\ p_{3,j} &= [ x_{d-2} \cdot x_{d-1} \cdot x_d,\ \ldots, x_1 (x_2 x_3 + \ldots + x_{d-1} x_d)], \\ &\ldots,\\ p_{d,d} &= [ x_1 x_2 \ldots x_d]. \end{array} $$ and the 2D-array of partial sums $q_{k,j} := q_{k,j-1} + p_{k,j}$. The arrays $(p_{i,j})_{ij}$ and $(q_{i,j})_{ij}$ are recursively computed through one another.

How to prove an expression of log-sum-exp type is convex in DCP?

1 Answers1