Gradient descent versus Lagrange multipliers

Question

I am new to optimization methods and trying to understand them. I am familiar with two main methods.

gradient descent, which, as I understood, means we try to calculate the next point on the function which will take us closer to the min/max: $$f(x^t)=x^{t-1}-\lambda(\frac{\partial}{\partial{x}}f(x^{t-1}))$$ where $\lambda$ acts as the step size or learning rate.
Lagrange multipliers, which is a method of finding the ideal place on the function (s.t another function) which will min\max it: $$ \mathcal{L}(x,\lambda) = \frac{\partial}{\partial{x}}f(x)-\lambda\frac{\partial}{\partial{x}}g(x)$$ while $$g(x)=0$$ that means $g(x)$ is the condition and $\lambda$ is a scalar which adjust the normal amplitude (so that the normal of $f(x)$ will be with the same amplitude of $g(x)$. The result of both equations will result in specific x and lambda that minimize the function $f(x)$.

I dont understand some basic points here:

when we use Lagrange multipliers we result with specific $x$ of the minimum, where is the next step for the iterations leading to the optimal result?
if we result with the optimal $x$ how does $\lambda$ affect the final result?
is there a connection between gradient descent and Lagrange multipliers? If so, what is the case where they are equal? (is gradient descent a specific case of lagrange multiplier in which $g(x)$ is a specific function? Will both $\lambda$'s have the same meaning in this case?)

Thank you.

If $x$ is a minimizer of a smooth function $f:\mathbb R^n \to \mathbb R$, then $\nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $\nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $\nabla f(x) + \sum_i \lambda_i \nabla g_i(x) = 0$ for some scalars $\lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work. — littleO, Nov 15 '17 at 11:38
thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers? — optimize, Nov 15 '17 at 20:56
I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t \nabla f(x^k)$. If the gradient descent iteration converges to a point $x^\star$, then we should have $x^\star = x^\star - t \nabla f(x^\star)$, which implies that $\nabla f(x^\star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $\nabla f(x) = 0$. — littleO, Nov 15 '17 at 21:17
There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $\nabla f(x) + \sum_i \lambda_i \nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent. — littleO, Nov 15 '17 at 21:21
yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x? — optimize, Nov 15 '17 at 21:22
is there a good method for iteration optimization when we have a constraint involved? — optimize, Nov 16 '17 at 07:36
Yes, I think the Augmented Lagrangian Method is the best example of an iterative method for solving optimization problems with equality constraints. Nocedal and Wright (section 17.3) gives the best explanation of the Augmented Lagrangian Method that I have seen. You might also be interested in reading about primal-dual interior point methods or primal-dual algorithms in convex optimization (although in convex optimization, equality constraints must be linear). — littleO, Nov 16 '17 at 07:52

score 3 · Answer 1 · answered Apr 06 '18 at 13:13

I'm not sure if your understanding of the Lagrange multipliers method is correct. This method involves the construction of the Lagrangian: $$ \mathcal{L}(x,\lambda) = f(x)-\lambda \cdot g(x)$$ and then finding solutions to: $$ \nabla_{x,\lambda} \mathcal{L}(x, \lambda)=0 \tag{1}\label{1} $$ Please note that this gradient is calculated with respect to both $x$ and $\lambda$: $$ \nabla_{x,\lambda} \mathcal{L}(x, \lambda)=\left( \frac{\partial \mathcal{L}}{\partial x}, \frac{\partial \mathcal{L}}{\partial \lambda} \right) $$ Solutions of $\eqref{1}$ are of course the critical points of $\mathcal{L}(x, \lambda)$, not necessarily extremums, also saddle points. Lagrange multipliers method tells us that the "$x$" part of some of those points constitute the minimum of the function $f(x)$ with $g(x)=0$

Gradient descent versus Lagrange multipliers

1 Answers1

Linked