9

I have the following nonlinear optimization problem: $$ \begin{align*} \text{Find } x \text{ that maximizes } & \frac{1}{\|Ax\|} (Ax)^{\top} y \\ \text{Subject to } & \sum_{i=1}^n x_i = 1 \\ & x_i \geq 0 \; \forall \: i \in \{1\dots n\} \\ \text{Where } & A \in \mathbb{R}^{m \, \times \, n} \\ & x \in \mathbb{R}^n \\ & y \in \mathbb{R}^m \\ & A_{i,j} \geq 0 \, \forall \, i \in \{1\dots m\}, j \in \{1\dots n\} \\ & y_i \geq 0 \, \forall \, i \in \{1\dots m\} \\ & \|y\| = 1 \\ & \|a_i\| = 1 \, \forall \text{ column vectors } a_i \text{ in } A \end{align*} $$

Motivation for this optimization problem (feel free to skip): I have $n+1$ normalized, nonnegative vectors of length $m$, $\{a_1, \dots, a_d, y\}$ which represent $n+1$ different topics. In my case, the similarity between two topics is determined by their dot product. I want to create a graph with the $n$ topics arranged as the vertices of a regular $n$-gon which shows where the topic represented by vector $y$ lies in between them. One measure of this is simply to take the dot product of $y$ with every vector $a_{1:d}$ to create a vector $z=A^{\top}y$, and then divide that vector by its sum and use it as weights to interpolate between vertices of the $n$-gon, but I don't think this approach is very meaningful because it doesn't tell you the point in the span of $A$ which, when normalized, is most similar to $y$, and in practice, most points just end up pretty close to the center. That's why I restated this problem as finding a linear combination of the vectors $a_{1:n}$, which, when normalized, is most similar to $y$.

A sort of guesstimate answer would be to find the non-negative least squares solution of $Ax=y$, and that works ok in practice, but I can't prove that it's optimal.

I think I could rewrite it in terms of KKT conditions, and solve it using calculus. However, I'm not really familiar with KKT conditions, so I wouldn't know how to do that.

What algorithm or technique could I use to solve this optimization problem?

I'm willing to accept an answer that just shows how to set this up as a quadratic program.

EDIT: I believe I've reduced this question to solving a system of equations by using the Karush-Kuhn-Tucker conditions that force $x$ to be non-negative. So, any answer that just solves this system of equations should work:

\begin{align*} \text{Solve for } x, \mu \text{ satisfying } & \\ & \frac{1}{\|Ax\|}A^{\top}\Big(y - \frac{x^{\top}A^{\top} y A x}{x^{\top}A^{\top}A x} \Big) + \mu = 0 \\ & \mu_i x_i = 0 \, \forall \, i \in \{1 \dots n\} \\ & \mu_i \geq 0 \, \forall \, i \in \{1 \dots n \} \\ \text{Where } & \\ & A \in \mathbb{R}^{m \, \times \, n} \\ & x \in \mathbb{R}^n \\ & \mu \in \mathbb{R}^n \\ & y \in \mathbb{R}^m \end{align*}

Sadly, I think this is just another quadratic program except that the objective function is $0$. The only quadratic program solver I've seen for Python (which is my work environment), CVXOPT, is quite slow.

DMcMor
  • 10,059
  • I'd suggest an interior point method. See the section on interior point methods for nonlinear optimization in Nocedal and Wright's book for the best reference – Nick Alger Apr 01 '17 at 04:00
  • 1
    You might need more structure on $A$... Anyway, note that $x^TA^Ty\leq |Ax|_2|y|_2$ by the Hölder inequality whith equality if and only if $Ax = \lambda y$ for some $\lambda \in\Bbb R$. Moreover, the constraint $\sum_i x_i=1$ can be easily omitted if $x_i\geq 0$ as your objective function is (positively) scale invariant. Now, if $Az=y$, then $z+\ker(A)$ are all solutions of the unconstrained problem. You could check if one of these solutions lies in the nonnegative orthant. – Surb Apr 01 '17 at 09:31
  • @Surb Thank you for the comment; this is really helpful. Can you explain what the nonnegative orthant means here? I've never heard that term. – michaelsnowden Apr 01 '17 at 09:52
  • @michaelsnowden By nonnegative orthant I mean $\Bbb R^n_+ = {u \in \Bbb R^n: u_i\geq 0, \forall i}$ – Surb Apr 01 '17 at 09:53
  • @Surb Ok, so if I understand correctly, your solution would be: if there's a solution to $Az = y$, then choose any vector in the kernel of $A$ that, when added to $z$ makes it positive. Then, since the objective is scale invariant, divide that vector by its sum. This would only work if there is a solution to $Az = y$, right? Thank you for your help. – michaelsnowden Apr 01 '17 at 10:01
  • @michaelsnowden that is the idea yes. Assumptions are indeed that there exists a nonnegative nonzero $v$ such that $Av = y$. If this is the case, then $v$ is a global maximizer of your problem. – Surb Apr 01 '17 at 10:09
  • @Surb Any idea what I should do if such a $v$ doesn't exist? Would linear regression (minimizing $||Av -y||^2$) essentially give me the same thing? – michaelsnowden Apr 01 '17 at 10:19
  • @michaelsnowden Not for the moment. I don't know, might be, I'd anyway first check that under the mentioned assumption, the method you decide to use finds it for sure (which seems to be the case for a nonnegative linear regression). – Surb Apr 01 '17 at 10:39
  • @Surb So, it seems like, for my matrix, the solution $v$ always exists, but $A$ always has no null space, so I end up with $v$ vectors with negative components that I can't get rid of. Any way to fix this? – michaelsnowden Apr 01 '17 at 11:57
  • Does $|Ax|$ represent the Euclidian norm, or another norm? – Dominique Apr 04 '17 at 19:58
  • @Dominique Euclidean (l2) norm – michaelsnowden Apr 04 '17 at 20:00
  • As written, the problem is not a quadratic program, due to the division by $|Ax|$. Also, the KKT conditions are no easier to solve directly, especially because of the complementarity condition $\mu_i x_i = 0$. – Dominique Apr 24 '17 at 14:52

1 Answers1

1

I am not sure it's possible to cast your problem as a quadratic program, but here is a formulation with a linear objective and quadratic constraints. Define $z := -Ax / \|Ax\|$. You could rewrite your problem as \begin{align*} \min_{x,z} \quad & y^T z \\ \text{s.t.} \quad & Ax + \|Ax\| z = 0 \\ & \sum_i x_i = 1, \ x \geq 0, \end{align*} where $z$ are now additional optimization variables. The first equality constraint doesn't look good and is nonsmooth if ever $x$ is in the nullspace of $A$ (your description doesn't say whether $m > n$ or $m \leq n$, and what the rank of $A$ is).

Note that $\|z\| = 1$ by construction, so we could introduce a new (scalar) optimization variable $\lambda$ that will equal $\|Ax\|$ at optimality like so: \begin{align*} \min_{x,z,\lambda} \quad & y^T z \\ \text{s.t.} \quad & Ax + \lambda z = 0 \\ & \|z\|^2 = 1 \\ & \sum_i x_i = 1, \ x \geq 0,\ \lambda \geq 0\\ . \\ \end{align*} This is a larger problem, but a friendlier one and it has linear and quadratic constraints.

Note however that it's a nonconvex problem, but so was your initial formulation. I think you can make the problem a bit more convex by changing $\|z\|^2 = 1$ to $\|z\|_2 \leq 1$ without changing the solution but I didn't spend much time trying to verify this claim.

As suggested in one of the comments, I would go with an interior-point method to solve the problem (e.g., IPOPT if you can give $A$ explicitly). That assumes that you only have one such problem to solve, and not a long sequence of similar problems. Otherwise I would try an active-set method that could benefit from warm starts, such as SNOPT (which is a commercial product).

Dominique
  • 3,224
  • I think that one can't change the restriction $|z|^2=1$ to $|z|^2 \leq 1$, because, if $|z|^2 < 1$ the solution may be identified as a KKT point for the problem: \begin{align} \min_{x} \quad & - y^T A x \ \text{s.t.} & \sum_i x_i = 1, \ x \geq 0, \end{align} which may have assume a global solution with $|Ax|<1.$ – R. W. Prado Jul 07 '22 at 14:45