Let $f$ be convex and $C^1$ (continuously differentiable), and let $g(x) = \|x\|_1$.
Note that $g$ is convex and not differentiable if some component $x(i)=0$.
Let (*) be the penalty problem:
For fixed $\lambda \geq 0$,
$$\text{arg}\min_x f(x) + \lambda g(x)\phantom{aaaaaaa} (*)$$
Let (**) be the constrained optimization problem:
For fixed $r>0$,
$$\text{arg}\min_x f(x) \phantom{aaaaaaaaaaaa} (**) \\ \text{ subject to } g(x) \leq r \phantom{aaaaaaaaaaaaaaa}$$
- Now the forward direction follows from LE TRAN Duc Kinh's answer (1). I'll repeat here with my notation:
Suppose $x_0$ is a solution to (*). Set $r = \|x_0\|_1$ so $g(x)\leq r$ is simply the condition that $x$ is in the $\ell(1)$ ball of radius $r$, and let $x_1$ be a solution to the constraint problem (**) (for this $r$).
Since $\|x_0\|_1 = r$, we have $f(x_1) \leq f(x_0)$. On the other hand, $f(x_0) + \lambda g(x_0) \leq f(x_1) + \lambda g(x_1) $ (since $x_0$ solves (*); from which it follows that $f(x_0) \leq f(x_1)$ as $g(x_0) = r = g(x_1)$. Hence we see $x_0$ is a solution to (**)!
- For the reverse direction, we need the machinery of subgradients. I am following notes here and book here (Sec. 5.2.2).
Definition: $z$ is a subgradient of a of function $h$ at $x$ iff for all $y$ we have
$$ h(y) - h(x) \geq \langle z, y-x \rangle$$
Let $\partial h(x) = \{z: z \text{ is a subgradient of } h \text{ at } x\}$
It can be shown that if $h$ is differentiable at $x$, then $\partial h(x)$ is the singleton set $\{\nabla h(x)\}.$
Note that it follows immediately from the definition that $z \in \partial h(x)$ means $z$ points in a direction where $h$ is increasing: $h(x+\eta z) - h(x) \geq \langle z, \eta z\rangle = \eta \|z\|_2^2$
Theorem: If $f$ is differentiable at $x$ and $g$ is convex, then the "obvious" generalization of linearity of differentiation applies: $\partial(f+g)(x) = \{\nabla f(x) + z: z\in \partial g(x) \} = \nabla f(x) + \partial g(x)$.
Theorem (optimization of convex, non-differentiable $h$): Now, $x_0 = \text{arg}\min_x h(x)$ iff for all $y$ $h(y) - h(x_0) \geq 0$. Using the definition above, this is iff $ 0 \in \partial h(x_0).$
Theorem (constrained optimization of convex, non-differentiable $h$): Now, $x_0 = \text{arg}\min_x h(x)$ subject to $x\in X$ (closed)
iff for all $y\in X $ $h(y) - h(x_0) \geq 0$. Using the definition above, this is iff there exists some $z \in \partial h(x)$ so that $\langle z, y-x_0\rangle = 0 $ for all $y\in X$.
We are now prepared to show the reverse direction.
It is stated in the reference 2 that the Lagrangian/KKT conditions hold for convex (non differentiable) functions with the subgradient. We show how this works (b/c we need it). Draw a picture of the $\ell(1)$ ball ($g$'s level set) and some level sets of $f$ to see how this works.
Fix $r>0$ and let $x_1$ be a solution to (**).
If $\|x_1\|_1 <r $, then $x_1$ is an unconstrained local minima; set $\lambda_1 = 0$.
If $\|x_1\|_1 = r$, this means $\nabla f(x_1)$ must be pointing into $\{ g \leq r \}$ (b/c the gradient points directly uphill, so if it pointed into $\{ g > r \}$ stepping backwards along the gradient would provide a lower minima in the constraint region). This means $\nabla f(x_1)$ is parallel and opposite in direction to some $z_1 \in \partial g(x_1)$ (remember $z_1$ points in a direction along which $g$ not going downhill (is going at least somewhat uphill or flat) by definition of subgradient). Hence there is some $\lambda_1 > 0$ so that $\nabla f(x_1) = -\lambda_1 z_1$.
We can now see that the pair, $x_1, \lambda_1$, is a stationary point (derivatives = 0) for the Lagrangian $L(x, \lambda) = f(x) + \lambda (g(x)-r)$, considered only where $\lambda \geq 0$).
Specifically, it follows from the fact that $\nabla f(x_1) + \lambda_1 z_1 = 0 \in \partial L(\lambda_1, x_1)$.
We have now shown that a solution to the constraint problem $(**)$ is equivalent to finding a stationary point of the Lagrangian with $\lambda >0$.
To finish, we now move to (*) with $\lambda = \lambda_1$. It follows immediately that $0 = \nabla f(x_1) +\lambda_1 z_1 \in \partial (f + \lambda_1 g)(x_1).$