Merit function vs Largrange Functions vs Penalty Funcitons

Question

I've been reading up on constraint optimization. I've come across the three terms:

Merit Function
Lagrange Function
Penalty Function

I'm pretty sure all these three things are the same. That is, they quantify how much an iterate satisfies both the objective and the constraint. However, I would like a second opinion to clarify.

score 9 · Accepted Answer · answered Dec 30 '17 at 19:01

No, they're not all the same and it's important to understand the differences between them.

Start with a simple optimization problem

$\min f(x)$

subject to

$g(x) = 0$

where we can assume for simplicity that $f$ and $g$ are smooth (at least twice continously differentiable.)

The Lagrangian function is

$L(x,\lambda)=f(x)+\lambda g(x)$

Note that $L$ is a function of $x$ and $\lambda$. The first order necessary condition for a point $x^{*}$ to be a minimizer is that there is a $\lambda^{*}$ such that $(x^{*},\lambda^{*})$ is a stationary point of $L$. In the method of multipliers, we try to solve the nonlinear system of equations

$\nabla_{x,\lambda} L(x,y)=0$

This is typically done by alternately minimizing with respect to $x$ and updating $\lambda$. Given a Lagrange multiplier estimate $\lambda^{(k)}$, we minimize $L(x,\lambda^{k})$ to get $x^{(k)}$. Then we update $\lambda$ with

$\lambda^{(k+1)}=\lambda^{(k)} +\alpha_{k} g(x^{(k)})$

Where $\alpha_{k}$ is a step size parameter that can be set in various ways.

An penalty function for our problem is a function that is $0$ if $g(x)=0$ and greater than $0$ when $g(x) \neq 0$. A commonly used penalty function is the quadratic penalty function

$\phi(g(x))=g(x)^{2}$

In the penalty function method, we solve an unconstrained problem of the form

$\min_{x} f(x)+\rho \phi(g(x))$

where $\rho$ is a penalty parameter that is increased until the solution of the penalized problem is close to satisfying $g(x)=0$. Note that $\rho$ is not a Lagrange multiplier in this case.

For problems with inequality constraints a commonly used penalty function is

$\phi(g(x))=\max(g(x),0)^{2}$.

An augmented Lagrangian function combines the penalty function idea with the Lagrangian:

$\hat{L}(x,\lambda; \rho)=f(x)+\lambda g(x) + \rho \phi(g(x))$

Augmented Lagrangian methods minimize $\hat{L}$ with respect to $x$, update the Lagrange multiplier estimate $\lambda$ and then (if necessary) update the penalty parameter $\rho$ in each iteration. In practice, augmented Lagrangian methods outperform simple penalty methods and the method of multipliers.

Merit functions are used in a variety of nonlinear programming algorithms. You'll most commonly see them used in sequential quadratic programming methods. In these methods, a search direction, $d^{(k)}$, is computed at each iteration. The step is from $x^{(k})$ to

$x^{(k+1)}=x^{(k)}+\alpha_{k} d^{(k)}$

where the step size parameter $\alpha_{k}$ is determined by minimizing a merit function

$\min_{\alpha} M(x^{(k)}+\alpha d^{(k)})$

The merit function is typically something like a penalized objective function or an augmented Lagrangian, but there's a great deal of freedom in the form of the merit function.

These functions and the associated methods are described in many textbooks on nonlinear optimization. A good discussion can be found in Numerical Optimization by Nocedal and Wright.

Are there any important "conceptual" or "heuristic" differences between these three different types of objectives? It seems to me, and perhaps also to the OP, that they are all different implementations of the same basic idea: instead of having a constraint ($g(x) = 0$), remove the constraint and add a term to the objective function which is greater than $0$ when the constraint is violated. While hypothetically the Lagrangian could reward constraint violations when $g(x) < 0$ (since $\lambda \ge 0$ by assumption), that doesn't happen since ultimately one solves for the minimum w.r.t. $x$. — Chill2Macht, Feb 16 '20 at 15:57

Merit function vs Largrange Functions vs Penalty Funcitons

1 Answers1

Linked