Gradient Descent with spherical and simplex constraint

Question

I have been working on a problem with a spherical constraint and another normalization constraint.

To be precise I have a function $\mathrm{H}(X_{i})$, and the $X_{i}$ are the variables that I wish to optimise. The constraints are

\begin{equation} \sum_{i=1}^{N} X_{i}^2 = 1 \qquad \sum_{i=1}^{N} X_{i} = m \end{equation}

I tried the method of lagrange multipliers to construct the function $\mathrm{H}'$ in the following form:

\begin{equation} \mathrm{H}' = H + \nu_{1}(\sum_{i=1}^{N} X_{i}^2 - 1) + \nu_{2} (\sum_{i=1}^{N} X_{i} - m) \end{equation}.

However, for some reason that I haven't been able to figure out this doesn't seem to work numerically. (I can compute $\nu_{1}$ and $\nu_{2}$ analytically. This is what I injected into the final gradient descent routine).

Following the response here, I was wondering whether a similar thing could be done for the normalization constraint above. I tried the following procedure:

Compute the gradient of the function $\mathrm{H}$ i.e. without the constraints. Then follow the link above. This fixes the spherical constraint.
Then "reproject" the new variable $X_{i}(t+\Delta t)$ on the plane defined by $\sum_{i}^{N} X_{i} = m$.

For the last step, I chose a vector $((1-m)/N, ... (1-m)/N)$ and then performed standard linear algebra operations for projecting $X_{i}(t+\Delta t)$.

This however doesn't seem to work too well in practice: The gradient decreases and so does the function $\mathrm{H}$. The spherical constraint is satisfied as well. However the normalization constraint isn't.

Any suggestions/ideas/references for such a problem? I have scoured the web for problems of such type. The spherical constraint seems a pretty standard one but the other one doesn't seem to occur in many places. I haven't seen any references that treat the two together. Thanks!

user7530 · Accepted Answer · 2020-12-21T09:59:00.140

3

First of all, the general "brute force" solution of doing projected gradient descent should work:

Compute the unconstrained gradient $\nabla H$;
Project the gradient onto the tangent space of the constraints (optional but can reduce the numerical difficulty of the next step). In other words, solve the subproblem $$\min_{v} \|v-\nabla H\|^2 \quad \mathrm{s.t}\quad 2X \cdot v = 0; \mathbf{1}\cdot v = 0.$$
Take a step $X \leftarrow X + v$.
Project back onto the constraint surface: solve $$\min_{\tilde X} \|X-\tilde X\|^2 \quad \mathrm{s.t.} \quad \|\tilde X\|^2=1; \tilde X\cdot \mathbf{1} = m$$ using e.g. Newton's method.

In the question that you linked, they have used the fact that the sphere has a closed-form exponential map to substantially simplify step 4: they compute a position $\tilde X$ directly on the sphere without needing to step + project.

Generally speaking the constraint manifold is too complex to allow such tricks. However in your case, notice that the constraint manifold is the intersection of a sphere and a plane, e.g. it is a sphere of dimension $N-1$, and therefore it is possible to write down a reduced representation of the set of all feasible points. In particular, let $v_1,\ldots,v_{n-1}$ be a completion of an orthonormal basis for $\mathbb{R}^N$, together with $\mathbf{1}/\sqrt{N}.$ Then all points satisfying your constraints are of the form $$X = \frac{m}{N}\mathbf{1} + \sum_{i=1}^{n-1} \alpha_iv_i$$ with $\|\alpha\|^2 = 1-\frac{m^2}{N^2}.$

You can thus do gradient descent directly on $h(\alpha) = H \circ X$ using the technique from the linked post, and avoid needing to deal with the constraints at all.

edited Dec 21 '20 at 09:59

answered Oct 09 '17 at 21:07

user7530

50,625

What I seem to be having a problem is in step no 2. $x_{k+1}=\frac{x_k-\alpha_kg_k}{|x_k-\alpha_kg_k|}$ is what was mentioned in the answer that I cited. However when I project the variables Xi on to the plane, the constraints don't seem to be satisfied.
I apologise if the questions are slightly stupid. This is a completely new field for me.
– bfg Oct 10 '17 at 12:14
I also don't get why there should be a one dimensional parametrization of the problem. $X$ is an $N-$ dimensional vector. For fixing it completely, I need $N$ variables at least. – bfg Oct 10 '17 at 12:19
@Dhruv you are absolutely right, I for some reason had in mind that $N=3$. Instead of a circle you have a sphere of dimension $N-2$. I will update my answer later. – user7530 Oct 10 '17 at 15:13
Thanks for the edit. Just a final question: what justifies such a decomposition. The decomposition of $\alpha$ as a vector in $\mathbb{R}^{N-1}$ makes sense. How do we know that this decomposition is unique? – bfg Oct 16 '17 at 09:18
@Dhruv the decomposition is not unique but that doesn’t matter. Think of it this way: if you transform coordinates of any of optimization problem you obviously do not change the problem’a solution. The decomposition above is a change of coordinates that make the constraints very easy to satisfy (they enforce that the coordinates representing motion away from the hyper sphere must be zero.) – user7530 Oct 16 '17 at 09:42
The sphere is not a Lie group, except for $S^1$ and $S^3$! – John Jiang Dec 20 '20 at 19:03
@JohnJiang Yes, agreed. – user7530 Dec 21 '20 at 09:59

Gradient Descent with spherical and simplex constraint

1 Answers1