I am new to optimization methods and trying to understand them. I am familiar with two main methods.
gradient descent, which, as I understood, means we try to calculate the next point on the function which will take us closer to the min/max: $$f(x^t)=x^{t-1}-\lambda(\frac{\partial}{\partial{x}}f(x^{t-1}))$$ where $\lambda$ acts as the step size or learning rate.
Lagrange multipliers, which is a method of finding the ideal place on the function (s.t another function) which will min\max it: $$ \mathcal{L}(x,\lambda) = \frac{\partial}{\partial{x}}f(x)-\lambda\frac{\partial}{\partial{x}}g(x)$$ while $$g(x)=0$$ that means $g(x)$ is the condition and $\lambda$ is a scalar which adjust the normal amplitude (so that the normal of $f(x)$ will be with the same amplitude of $g(x)$. The result of both equations will result in specific x and lambda that minimize the function $f(x)$.
I dont understand some basic points here:
when we use Lagrange multipliers we result with specific $x$ of the minimum, where is the next step for the iterations leading to the optimal result?
if we result with the optimal $x$ how does $\lambda$ affect the final result?
is there a connection between gradient descent and Lagrange multipliers? If so, what is the case where they are equal? (is gradient descent a specific case of lagrange multiplier in which $g(x)$ is a specific function? Will both $\lambda$'s have the same meaning in this case?)
Thank you.