8

I am trying to characterize the sub-gradient of $\ell_0$ "norm"

$$f(x) := \|x\|_0 := \sum_{i=1}^n 1\{{x_i \neq 0}\}$$

At first, since it satisfies the triangle inequality, I thought that the $\ell_0$ "norm" is convex and non-smooth. Then, I tried to separate the two cases 1) $x=0$ and 2) $x \neq 0$ and set the sub-gradient as the first derivative with respect to $x$ for the latter case, which is $0$ due to constant. However, applying this sub-gradient would lead to a counter-example of definition

$$f(0) < f(x) + 0(0-x), \qquad x \neq 0$$

which confused me a lot. Later, I figured our $\ell_0$ "norm" is a non-convex function since it does not satisfy the property of positive homogeneity. Hence, I am clueless now. Please kindly advise. I am not really experienced in optimization.

mr noname
  • 167

2 Answers2

14

I have long suspected that this practice of calling the cardinality function the "$\ell_0$ norm" would cause problems, and this post is evidence of that.

The so-called "$\ell_0$ norm" is not a norm, and it is not convex.

If Wikipedia is to be believed, the term "$\ell_0$ norm" was coined by David Donoho, in his work on using the $\ell_1$ norm (a true norm, and therefore convex) as a proxy for cardinality in convex statistical regressions. It was placed in quotes to make it clear that it is an abuse of notation. Alas, people have adopted this term without the quotes and quite often without any indication that it is incorrect terminology.

I frankly think that the usage of the term and the equally faulty "$\|\cdot\|_0$" notation needs to be resisted everywhere, including peer review and less formal forums like this. I believe we should insist that writers use terms like "cardinality function" and $\mathop{\mathrm{card}}(\cdot)$ instead.

To return to your specific question: because the cardinality function is not a norm, and because it is not convex, convex optimization methods simply cannot be used with it. Recall that the subdifferential $\partial f(x)$ of a function $f:\mathbb{R}^n\rightarrow\mathbb{R}$ at a point $x\in\mathbb{R}^n$ is the set of subgradients $$\partial f(x) \triangleq \{ v\in\mathbb{R}^n~|~f(x) + \langle v, y - x \rangle \leq f(y) ~ \forall y\in\mathbb{R}^n \}$$ Unlike a convex function---which has at least one subgradient everywhere on its domain---a nonconvex function often has no subgradient, even if one were to employ an alternate definition that allows for a "local" subgradient.

EDIT: So just what is the subdifferential of the cardinality function? That is, what is $$\partial \mathop{\mathrm{card}}(x) \triangleq \{ v\in\mathbb{R}^n~|~\mathop{\mathrm{card}}(x) + \langle v, y - x \rangle \leq \mathop{\mathrm{card}}(y) ~ \forall y\in\mathbb{R}^n \}$$It has a subgradient at exactly one point: the origin, where $\partial \mathop{\mathrm{card}}(0) = \{0\}.$ Everywhere else, $\partial \mathop{\mathrm{card}}(x) = \emptyset.$ It's not difficult to see why:

  • Since $\mathop{\mathrm{card}}(x)\leq n$, no non-zero vector $v$ can be a subgradient at any $x\in\mathbb{R}^n$, including the origin, because the secant inequality will fail: that is, $\mathop{\mathrm{card}}(x) + \langle v, y - x \rangle \not \leq n$ for at least some $y$, e.g. $y=x+(n+1)\|v\|_2^{-1}v.$
  • If $v=0$, then the secant inequality then becomes $\mathop{\mathrm{card}}(x) \leq \mathop{\mathrm{card}}(y)~\forall y\in\mathbb{R}^n$, which only holds at the origin.
Michael Grant
  • 20,110
  • 1
    Well said! But there is also the notion of subgradient for general functions (from Rockafellar/Wets Variational Analysis) which leads to some more subgradient - but these are usually of no use either... – Dirk Feb 15 '15 at 21:39
  • Good point, yes. I tried to allude to this rather sloppily above when I talked about relaxing the definition to include a local subgradient. It's sloppy because I frankly didn't know what the best generalization was. Thanks for the reference, I will go check it out and possibly re-edit. – Michael Grant Feb 15 '15 at 21:40
1

Old post, and still a relevant question. Just to be complete, the analytical form of the generalized subdifferential of the cardinality function is known for several concepts of the generalized subdifferential (Fréchet, limiting, proximal, ...). The gist of it is that in finite dimensions, $$ ∂\text{card}(x) = \{v \mid v_i = 0 \text{ if } x_i \neq 0\}, $$ for any $x$.

Proving it is not an exercise! Details can be found in the paper https://doi.org/10.1007/s11590-012-0456-x.

Dominique
  • 3,224