How to solve this nonlinear constrained optimization problem

Question

I have the following nonlinear optimization problem: $$ \begin{align*} \text{Find } x \text{ that maximizes } & \frac{1}{\|Ax\|} (Ax)^{\top} y \\ \text{Subject to } & \sum_{i=1}^n x_i = 1 \\ & x_i \geq 0 \; \forall \: i \in \{1\dots n\} \\ \text{Where } & A \in \mathbb{R}^{m \, \times \, n} \\ & x \in \mathbb{R}^n \\ & y \in \mathbb{R}^m \\ & A_{i,j} \geq 0 \, \forall \, i \in \{1\dots m\}, j \in \{1\dots n\} \\ & y_i \geq 0 \, \forall \, i \in \{1\dots m\} \\ & \|y\| = 1 \\ & \|a_i\| = 1 \, \forall \text{ column vectors } a_i \text{ in } A \end{align*} $$

Motivation for this optimization problem (feel free to skip): I have $n+1$ normalized, nonnegative vectors of length $m$, $\{a_1, \dots, a_d, y\}$ which represent $n+1$ different topics. In my case, the similarity between two topics is determined by their dot product. I want to create a graph with the $n$ topics arranged as the vertices of a regular $n$-gon which shows where the topic represented by vector $y$ lies in between them. One measure of this is simply to take the dot product of $y$ with every vector $a_{1:d}$ to create a vector $z=A^{\top}y$, and then divide that vector by its sum and use it as weights to interpolate between vertices of the $n$-gon, but I don't think this approach is very meaningful because it doesn't tell you the point in the span of $A$ which, when normalized, is most similar to $y$, and in practice, most points just end up pretty close to the center. That's why I restated this problem as finding a linear combination of the vectors $a_{1:n}$, which, when normalized, is most similar to $y$.

A sort of guesstimate answer would be to find the non-negative least squares solution of $Ax=y$, and that works ok in practice, but I can't prove that it's optimal.

I think I could rewrite it in terms of KKT conditions, and solve it using calculus. However, I'm not really familiar with KKT conditions, so I wouldn't know how to do that.

What algorithm or technique could I use to solve this optimization problem?

I'm willing to accept an answer that just shows how to set this up as a quadratic program.

EDIT: I believe I've reduced this question to solving a system of equations by using the Karush-Kuhn-Tucker conditions that force $x$ to be non-negative. So, any answer that just solves this system of equations should work:

\begin{align*} \text{Solve for } x, \mu \text{ satisfying } & \\ & \frac{1}{\|Ax\|}A^{\top}\Big(y - \frac{x^{\top}A^{\top} y A x}{x^{\top}A^{\top}A x} \Big) + \mu = 0 \\ & \mu_i x_i = 0 \, \forall \, i \in \{1 \dots n\} \\ & \mu_i \geq 0 \, \forall \, i \in \{1 \dots n \} \\ \text{Where } & \\ & A \in \mathbb{R}^{m \, \times \, n} \\ & x \in \mathbb{R}^n \\ & \mu \in \mathbb{R}^n \\ & y \in \mathbb{R}^m \end{align*}

Sadly, I think this is just another quadratic program except that the objective function is $0$. The only quadratic program solver I've seen for Python (which is my work environment), CVXOPT, is quite slow.

I'd suggest an interior point method. See the section on interior point methods for nonlinear optimization in Nocedal and Wright's book for the best reference — Nick Alger, Apr 01 '17 at 04:00
You might need more structure on $A$... Anyway, note that $x^TA^Ty\leq |Ax|_2|y|_2$ by the Hölder inequality whith equality if and only if $Ax = \lambda y$ for some $\lambda \in\Bbb R$. Moreover, the constraint $\sum_i x_i=1$ can be easily omitted if $x_i\geq 0$ as your objective function is (positively) scale invariant. Now, if $Az=y$, then $z+\ker(A)$ are all solutions of the unconstrained problem. You could check if one of these solutions lies in the nonnegative orthant. — Surb, Apr 01 '17 at 09:31
@Surb Thank you for the comment; this is really helpful. Can you explain what the nonnegative orthant means here? I've never heard that term. — michaelsnowden, Apr 01 '17 at 09:52
@michaelsnowden By nonnegative orthant I mean $\Bbb R^n_+ = {u \in \Bbb R^n: u_i\geq 0, \forall i}$ — Surb, Apr 01 '17 at 09:53
@Surb Ok, so if I understand correctly, your solution would be: if there's a solution to $Az = y$, then choose any vector in the kernel of $A$ that, when added to $z$ makes it positive. Then, since the objective is scale invariant, divide that vector by its sum. This would only work if there is a solution to $Az = y$, right? Thank you for your help. — michaelsnowden, Apr 01 '17 at 10:01
@michaelsnowden that is the idea yes. Assumptions are indeed that there exists a nonnegative nonzero $v$ such that $Av = y$. If this is the case, then $v$ is a global maximizer of your problem. — Surb, Apr 01 '17 at 10:09
@Surb Any idea what I should do if such a $v$ doesn't exist? Would linear regression (minimizing $||Av -y||^2$) essentially give me the same thing? — michaelsnowden, Apr 01 '17 at 10:19
@michaelsnowden Not for the moment. I don't know, might be, I'd anyway first check that under the mentioned assumption, the method you decide to use finds it for sure (which seems to be the case for a nonnegative linear regression). — Surb, Apr 01 '17 at 10:39
@Surb So, it seems like, for my matrix, the solution $v$ always exists, but $A$ always has no null space, so I end up with $v$ vectors with negative components that I can't get rid of. Any way to fix this? — michaelsnowden, Apr 01 '17 at 11:57
As written, the problem is not a quadratic program, due to the division by $|Ax|$. Also, the KKT conditions are no easier to solve directly, especially because of the complementarity condition $\mu_i x_i = 0$. — Dominique, Apr 24 '17 at 14:52

score 1 · Answer 1 · edited Jul 07 '22 at 14:59

I am not sure it's possible to cast your problem as a quadratic program, but here is a formulation with a linear objective and quadratic constraints. Define $z := -Ax / \|Ax\|$. You could rewrite your problem as \begin{align*} \min_{x,z} \quad & y^T z \\ \text{s.t.} \quad & Ax + \|Ax\| z = 0 \\ & \sum_i x_i = 1, \ x \geq 0, \end{align*} where $z$ are now additional optimization variables. The first equality constraint doesn't look good and is nonsmooth if ever $x$ is in the nullspace of $A$ (your description doesn't say whether $m > n$ or $m \leq n$, and what the rank of $A$ is).

Note that $\|z\| = 1$ by construction, so we could introduce a new (scalar) optimization variable $\lambda$ that will equal $\|Ax\|$ at optimality like so: \begin{align*} \min_{x,z,\lambda} \quad & y^T z \\ \text{s.t.} \quad & Ax + \lambda z = 0 \\ & \|z\|^2 = 1 \\ & \sum_i x_i = 1, \ x \geq 0,\ \lambda \geq 0\\ . \\ \end{align*} This is a larger problem, but a friendlier one and it has linear and quadratic constraints.

Note however that it's a nonconvex problem, but so was your initial formulation. I think you can make the problem a bit more convex by changing $\|z\|^2 = 1$ to $\|z\|_2 \leq 1$ without changing the solution but I didn't spend much time trying to verify this claim.

As suggested in one of the comments, I would go with an interior-point method to solve the problem (e.g., IPOPT if you can give $A$ explicitly). That assumes that you only have one such problem to solve, and not a long sequence of similar problems. Otherwise I would try an active-set method that could benefit from warm starts, such as SNOPT (which is a commercial product).

I think that one can't change the restriction $|z|^2=1$ to $|z|^2 \leq 1$, because, if $|z|^2 < 1$ the solution may be identified as a KKT point for the problem: \begin{align} \min_{x} \quad & - y^T A x \ \text{s.t.} & \sum_i x_i = 1, \ x \geq 0, \end{align} which may have assume a global solution with $|Ax|<1.$ — R. W. Prado, Jul 07 '22 at 14:45

How to solve this nonlinear constrained optimization problem

1 Answers1

Linked