I have the following nonlinear optimization problem: $$ \begin{align*} \text{Find } x \text{ that maximizes } & \frac{1}{\|Ax\|} (Ax)^{\top} y \\ \text{Subject to } & \sum_{i=1}^n x_i = 1 \\ & x_i \geq 0 \; \forall \: i \in \{1\dots n\} \\ \text{Where } & A \in \mathbb{R}^{m \, \times \, n} \\ & x \in \mathbb{R}^n \\ & y \in \mathbb{R}^m \\ & A_{i,j} \geq 0 \, \forall \, i \in \{1\dots m\}, j \in \{1\dots n\} \\ & y_i \geq 0 \, \forall \, i \in \{1\dots m\} \\ & \|y\| = 1 \\ & \|a_i\| = 1 \, \forall \text{ column vectors } a_i \text{ in } A \end{align*} $$
Motivation for this optimization problem (feel free to skip): I have $n+1$ normalized, nonnegative vectors of length $m$, $\{a_1, \dots, a_d, y\}$ which represent $n+1$ different topics. In my case, the similarity between two topics is determined by their dot product. I want to create a graph with the $n$ topics arranged as the vertices of a regular $n$-gon which shows where the topic represented by vector $y$ lies in between them. One measure of this is simply to take the dot product of $y$ with every vector $a_{1:d}$ to create a vector $z=A^{\top}y$, and then divide that vector by its sum and use it as weights to interpolate between vertices of the $n$-gon, but I don't think this approach is very meaningful because it doesn't tell you the point in the span of $A$ which, when normalized, is most similar to $y$, and in practice, most points just end up pretty close to the center. That's why I restated this problem as finding a linear combination of the vectors $a_{1:n}$, which, when normalized, is most similar to $y$.
A sort of guesstimate answer would be to find the non-negative least squares solution of $Ax=y$, and that works ok in practice, but I can't prove that it's optimal.
I think I could rewrite it in terms of KKT conditions, and solve it using calculus. However, I'm not really familiar with KKT conditions, so I wouldn't know how to do that.
What algorithm or technique could I use to solve this optimization problem?
I'm willing to accept an answer that just shows how to set this up as a quadratic program.
EDIT: I believe I've reduced this question to solving a system of equations by using the Karush-Kuhn-Tucker conditions that force $x$ to be non-negative. So, any answer that just solves this system of equations should work:
\begin{align*} \text{Solve for } x, \mu \text{ satisfying } & \\ & \frac{1}{\|Ax\|}A^{\top}\Big(y - \frac{x^{\top}A^{\top} y A x}{x^{\top}A^{\top}A x} \Big) + \mu = 0 \\ & \mu_i x_i = 0 \, \forall \, i \in \{1 \dots n\} \\ & \mu_i \geq 0 \, \forall \, i \in \{1 \dots n \} \\ \text{Where } & \\ & A \in \mathbb{R}^{m \, \times \, n} \\ & x \in \mathbb{R}^n \\ & \mu \in \mathbb{R}^n \\ & y \in \mathbb{R}^m \end{align*}
Sadly, I think this is just another quadratic program except that the objective function is $0$. The only quadratic program solver I've seen for Python (which is my work environment), CVXOPT, is quite slow.