2

I want to solve the following problem:

$$ \arg\min_x |x|_\mu + \frac{1}{2\sigma} |x-x^k|^2 $$

Where the Huber Loss Function is given by:

$$|x|_\mu = \begin{cases} \frac{|x|^2}{2}, & |x| \leq \mu \\ \mu \left( |x|-\frac \mu 2 \right) & |x| > \mu \end{cases}. $$

Then, the optimal condition is $$ 0 = \partial(|\hat x|_\mu) + \frac 1\sigma (\hat x - x^k) $$

However, at this time, I don't know how to proceed.

dohmatob
  • 9,753
jakeoung
  • 1,341

1 Answers1

5

To simplify the exposition, consider the standard case: $\mu = 1$ (see justification below). In any dimensions (your case is 1-dimensional), the Huber function is the infimal convolution of the $\ell_1$-norm $x \mapsto \|x\|_1$ and the half-squared $\ell_2$-norm $x \mapsto \frac{1}{2}\|x\|_2^2$, i.e $h = \|.\|_1 \Box \frac{1}{2}\|.\|_2^2$ (prove this as excercise, or ask for specific details / help in comment section...). Thus, $h^* = \|.\|_1^* + (\frac{1}{2}\|.\|_2^2)^* = i_{\mathbb B_\infty} + \frac{1}{2}\|.\|_2^2$, where $\mathbb B_\infty$ is the unit-ball for the $\ell_\infty$-norm in $\mathbb R^n$. Now, by Moreau's proximal decomposition, one computes $$ \begin{split} \frac{y-\mathrm{prox}_{\sigma h}(y)}{\sigma} = \mathrm{prox}_{\frac{1}{\sigma}h^*}\left(\frac{1}{\sigma}y\right) &= \mathrm{arg}\min_{x \in \mathbb B_\infty}\frac{1}{2}\left\|x-\frac{y}{\sigma}\right\|_2^2 + \frac{1}{\sigma}\frac{1}{2}\|x\|_2^2\\ &= \mathrm{arg}\min_{{x \in \mathbb B_\infty}}\frac{\sigma + 1}{2\sigma}\left\|x - \frac{y}{\sigma + 1}\right\|_2^2 + \text{ const.}\\ & = P_{\mathbb B_\infty}\left(\frac{y}{\sigma + 1}\right) = (v_1, v_2, \ldots, v_n), \end{split} $$ where $v_j = \frac{y_j}{\max(|y_j|, \sigma + 1)}$. Thus the $j$th component of $\mathrm{prox}_{\sigma h}(y)$ is given by $$(\mathrm{prox}_{\sigma h}(y))_j = y_j - \frac{\sigma y_j}{\max(|y_j|, \sigma + 1)}. $$

Watch our for computational errors!


Justification of only considering the case $\mu=1$: Indeed for general $\mu$, if $\phi_\mu(x) := |x|_\mu$, then it is easy to check that $\phi_\mu(x) = \mu^2\phi_1(x/\mu)=\mu^2 h(x/\mu)$, where $h := \mu_1$. Thus, $$ \text{prox}_{\sigma\phi_\mu}(y) = \arg\min_{x}\frac{1}{2}\|x-y\|^2 + \sigma \phi_\mu(x) = \arg\min_{x}\frac{1}{2}\|x-y\|^2 + \mu^2\sigma h(x/\mu) = \mu z, $$ where $z = \arg\min_{z}\frac{1}{2}\|\mu z - y\|^2 + \mu^2\sigma h(z) = \arg\min_{z}\frac{1}{2}\|z - y/\mu\|^2 + \sigma h(z) = \text{prox}_{\sigma h}(y/\mu). $

$\therefore \text{prox}_{\sigma\phi_\mu}(y) = \mu \text{prox}_{\sigma h}(y/\mu)$.

dohmatob
  • 9,753
  • Many thanks for very clear answer. Could you see my new question about Huber of affine: http://math.stackexchange.com/questions/1876597/proximal-operator-to-huber-affine-function . – jakeoung Jul 31 '16 at 06:50
  • You're welcome. OK, I've dropped a note. Lemme know if it answers your question. – dohmatob Aug 01 '16 at 01:21
  • Is this correct? From the expression you get it seems that the prox of the Huber function splits down to the single components, which would suggest the Huber function itself is separable, but it isn't. The derivation seems correct to me, don't get me wrong, it's just counterintuitive that you can compute the prox of the Huber function with no "global" information on y, such as ||y|| – Lorenzo Stella Jan 27 '17 at 19:38
  • For example in this paper http://web.stanford.edu/~boyd/papers/pdf/oper_splt_ctrl.pdf (bottom of page 2438) the expression of the prox involves indeed the norm of the point – Lorenzo Stella Jan 27 '17 at 20:40
  • Yes it is correct. If you replace the absolute value in the problem i'm solving above by the two norm $|.|_2$ (as in the paper you refer two), then you should arrive at an analogous formula to the one I gave above, with the abs value replaced with $|.|_2$. If it can help, note that $1 / \max(a, b) = \min(1/a,1/b)$ for $a,b > 0$. – dohmatob Jan 28 '17 at 12:06
  • I think the difference lies in the definition of "Huber loss": you are considering the infimal convolution of the l1-norm and the half-squared l2-norm, which is indeed separable: in fact it is the "separable-sum" of one-dimensional Huber losses. On the other hand the "circulant" Huber loss, in the paper I referred to, should be the infimal convolution between the l2-norm and the half-squared l2-norm. They are two different functions of course, one separable and the other not. – Lorenzo Stella Feb 07 '17 at 19:29
  • 1
    @dohmatob a well-versed parietaler like you should use \cdot inside $||\cdot||$ rather than a simple '.' – P. Camilleri Apr 04 '18 at 21:25
  • @dohmatob, I verified the result numerically and analytically. It would be great if you extend the solution for the case of any $ \mu $. Also the link to the exercise doesn't work (Better write what you link for as well). – Royi Mar 20 '20 at 23:29
  • @Royi Thanks for double-checking. Well, let $\phi_\mu(x) := |x|\mu$. Then $\phi\mu(x) = \phi_1(x/\mu)$, and we know how proximal operators transform under rescalings (cab be derived via a change of variable, for ex., see http://proximity-operator.net/proximityoperator.html). If there are specific questions about the derivation, I can help. Disclaimer: Mine is not meant to be an "off-the-shelf" solution, but an invitation to the OP to learn proximal calculus (which only involves very basic math and one or two important theorems), and solve their problem by the same token :). – dohmatob Mar 21 '20 at 08:36
  • I don't think your scaling property of the Huber Loss holds. At least not for the Huber Loss as it is defined above (Or Wikipedia). HuberLoss(6, 3) = 13.5 and HuberLoss(6 / 3, 1) = 1.5. – Royi Mar 21 '20 at 09:08
  • Good catch! Should be $\phi_\mu(x) = \mu^2\phi_1(x/\mu)$. This shouldn't change anything in what I said before though. BTW, the Huber formula given in the question looks wrong: it's discontinuous at $x=\pm \mu$. The wikipedia version is correct, namely $|x|_\mu = \begin{cases}x^2/2,&\mbox{ if }|x| \le \mu,\ \mu(|x| - \mu/2),&\mbox{ else.}\end{cases}$ – dohmatob Mar 21 '20 at 09:42
  • @dohmatob Shouldn't the final answer has $\mu^2$ instead of $\mu$ – shani Jan 17 '22 at 07:45
  • @dohmatob hello, can you help me with the proof the Huber function is the infimal convolution of $l1$ and $l2$? https://math.stackexchange.com/questions/4804910/inf-convolution-of-norm-1-and-norm-2-square – Pipnap Nov 13 '23 at 01:38