6

Consider a proximal operator,

$$ \operatorname{Prox}_{ \lambda f } \left( \mu x \right) := \arg \min_{u} \lambda f \left( u \right) + \frac{1}{2} {\left\| u - \mu x \right\|}_{2}^{2}.$$

What is the partial derivative of the proximal operator w.r.t. $\lambda$ and $\mu$, i.e.

$$\frac{\partial\operatorname{Prox}_{ \lambda f } \left( \mu x \right)}{\partial\lambda}, \quad \frac{\partial\operatorname{Prox}_{ \lambda f} \left( \mu x \right)}{\partial\mu}?$$

If the general case is not solvable, is it possible to compute the derivative if we restrict $f$ to be an $L_p$ norm?

ViktorStein
  • 5,024
w382903
  • 195
  • I don't think that it is differentiable w.r.t. $\mu$. In case that $f$ is an indicator function, the prox is just the projection. Projections are, in general, not differentiable w.r.t. $x$. – gerw Aug 30 '19 at 09:15
  • Notice that the prox can be seen as the gradient of the moreau envelope of the convex conjugate function. Then, there is a relationship between the gradient of the moreau envelope with respect to the smoothing parameter and the gradient of the moreau envelope with respect to the optimization variable (this is the hamilton jacobi equation actually). So if you assume enough differentiability and what you need to interchange partial derivatives with respect to smoothing parameter and optimization variable, you can have an explicit formula. – Jürgen Sukumaran Feb 03 '23 at 15:59

2 Answers2

3

For the restricted case where $f$ is differentiable one can derive a solution. First, the derivative w.r.t. to $\lambda$ is

$$\frac{\partial\operatorname{Prox}_{ \lambda f( u ) } \left( x \right)}{\partial\lambda} = \lim_{\epsilon\to 0}\frac{1}{\epsilon}\left[\operatorname{Prox}_{ (\lambda + \epsilon) f( u ) } \left( x \right) - \operatorname{Prox}_{ \lambda f( u ) } \left( x \right)\right]$$

The solution to $\operatorname{Prox}_{ (\lambda + \epsilon) f( u ) } \left( x \right)$ can be computed from a simple Taylor expansion. In particular, any solution has to fulfill

$$(\lambda + \epsilon) \nabla f(u) + (u - \mu x) = 0$$ $$\Leftrightarrow (\lambda + \epsilon) \nabla f(u^{*} + du) + u^{*} + du - \mu x = 0$$

where $u^{*} = \operatorname{Prox}_{ \lambda f( u ) } \left( x \right)$. Then, with $H_f(u^{*})$ being the Hessian of $f$,

$$\Leftrightarrow (\lambda + \epsilon) (\nabla f(u^{*}) + H_f(u^{*}) du) + u^{*} + du - \mu x = 0$$

$$\Leftrightarrow \epsilon \nabla f(u^{*}) + (\lambda + \epsilon) H_f(u^{*}) du + du = 0$$

Hence,

$$du = -\epsilon\left[(\lambda + \epsilon)H_f(u^{*}) + I\right]^{-1}\nabla f(u^{*})$$

$$\Rightarrow \frac{\partial\operatorname{Prox}_{ \lambda f( u ) } \left( x \right)}{\partial\lambda} = -\left[\lambda H_f(u^{*}) + I\right]^{-1}\nabla f(u^{*})$$

In a very similar way we can find

$$\frac{\partial\operatorname{Prox}_{ \lambda f( u ) } \left( x \right)}{\partial\mu} = \left[\lambda H_f(u^{*}) + I\right]^{-1} x$$

w382903
  • 195
  • 1
    Do you know any book or article about the differentiability of the proximal operator with respect to $\lambda$? – ViktorStein Sep 07 '23 at 15:42
  • @ViktorStein I am also looking for a reference. Did you happen to find something about it? – rod Jul 05 '24 at 13:01
  • 1
    @rod the only thing I found is that the Moreau envelope satisfies a Hamilton-Jacobi-Bellmann equation in parameters $x$ and $\lambda$. – ViktorStein Jul 05 '24 at 18:58
  • @ViktorStein thank you very much for your answer Viktor – rod Jul 06 '24 at 23:25
2

The prox operator takes a point (vector) and maps it into a subset of your vector space, this mapping might be empty, a singleton or a set. Therefore the prox operator is not differentiable.

The following example is from the book by Beck. Consider the following functions: \begin{align} g_1(x) &=0, \\ g_2(x)&=\begin{cases} 0 & x \neq 0\\ - c & x=0, \end{cases}\\ g_3(x)&=\begin{cases} 0 & x \neq 0\\ c & x=0, \end{cases} \end{align} then the prox of the previous functions is:

\begin{align} \text{prox}_{g_1}(x)&=\{x\}.\\ \text{prox}_{g_2}(x)&=\begin{cases} \{0\}, & |x| < \sqrt{2c},\\ \{x\}, & |x| > \sqrt{2c}, \\ \{0,x\}, & |x| = \sqrt{2c}. \end{cases}\\ \text{prox}_{g_3}(x)&=\begin{cases} \{0\} & x \neq 0,\\ \emptyset & x=0. \end{cases} \end{align}

On the other hand, the Moreau envelope, defined as $$M^{\mu}_f(x) = \inf_{y}\bigg\{f(y)+\frac{1}{2\mu} ||x-y||^2 \bigg\},$$ is a smooth map (in fact $\mu$ is called the smoothing parameter), therefore it makes sense to talk about differentiability. The derrivate of the Moreau envelope is $$\nabla M^{\mu}_f(x) = \frac{1}{\mu}(x - \text{prox}_{\mu f}(x)).$$

You can read more on the excellent books by Beck (Ch. 6) and Bauschke & Combettes (Ch. 12).

ConEd
  • 124
  • 2
    But if we assume that $f:\mathbb R^n \to \mathbb R$ is closed and convex then $\text{prox}_{tf}(x)$ is guaranteed to be a singleton, so the proximal operator of $f$ can be viewed as a function from $\mathbb R^n$ to $\mathbb R^n$. I bet OP would be willing to make this assumption. – littleO Aug 31 '19 at 20:48
  • 1
    That's a great answer to the general case! As @littlO is saying, I'd be willing to assume that $f$ is closed and convex (e.g. if it is an Lp norm). – w382903 Sep 02 '19 at 13:00