6

I want to show that if $f:\mathbb{E}\rightarrow\mathbb{R}$ is convex, and differentiable at $x$, then $\partial f(x) = \{ \nabla f(x) \} $ .

I understand that for a convex function, we have the following:

$$f(y) \le f(x)+\nabla f^T(y-x) \forall x,y \in dom \, f$$

and I know that the definition of the subdifferential of $f$ at $x$ is:

$$ \partial f(x) = \{ z | f(y) \ge f(x)+<z,y-x> \forall x,y \in dom\, f \} \}$$

So, I'm able to say that if $f$ is convex, then certainly $\nabla f(x) \in\partial f(x)$. But I do not understand how to show $\partial f(x) = \{ \nabla f(x) \} $?

(as a bit of a side note, I do not understand how we know that $f'(x)=\nabla f$? I know that it is differentiable at $x$, but that does not necessarily mean that the gradient exists right.....?

A function can be differentiable in all directions, and the gradient not exist, and we did not say that $f$ was continuous)

ViktorStein
  • 5,024
makansij
  • 1,643
  • related: https://math.stackexchange.com/questions/2697569/why-subgradient-of-convex-function-is-unique?rq=1 – makansij Apr 29 '18 at 22:03

2 Answers2

8

Define the function $$\phi(t):=f(x+t(y-x))$$ for $t\in [0,1]$. Since $f$ is convex implies $$\phi(t)\leqslant (1-t)f(x)+tf(y)=(1-t)\phi(0)+t\phi(1)$$ hence $\phi(t)$ is a convex function on $[0,1]$. Moreover differentiability of $f$ implies that $\phi$ is also differentiable. In particular we have $$\phi'(0)=\langle\nabla f(x),y-x\rangle$$ Convexity of $\phi$ yields $$\phi'(0)\leqslant \phi(1)-\phi(0)=f(y)-f(x)\Rightarrow\langle \nabla f(x),y-x\rangle\leqslant f(y)-f(x)$$ Therefore by definition $\nabla f(x)\in\partial f(x)$. Now suppose that $z\in \partial f(x)$ then $$f(y)\geqslant f(x)+\langle z,y-x\rangle$$ for all $y$. Let $y:=x+tw$ and define the function $$\psi(t):=f(y)-f(x)-\langle z,y-x\rangle=f(x+tw)-f(x)-\langle z,tw\rangle$$ This gives $$\psi'(t)=\langle\nabla f(x), w\rangle-\langle z,w\rangle$$ Since $z\in\partial f(x)$ then $\psi(t)\geqslant 0$ for all $t$. Also $\psi(0)=0$ hence at $t=0$ we have a minimum for $\psi$. This implies $\psi'(0)=0$ which is equivalent to saying $$\langle\nabla f(x), w\rangle=\langle z,w\rangle$$ Since the last equation holds for any arbitrary $w$ then $\nabla f(x)=z$. This proves that in fact $$\partial f(x)=\{\nabla f(x)\}$$

Arian
  • 6,449
  • Thanks. It seems like the beginning part of the proof is the same as mine in the question, up until where you introduce $\psi(t)$, right? – makansij Apr 29 '18 at 22:31
  • Two questions: First I am not seeing how $\psi(t) \ge 0$? Secondly, I understand that $\psi’(0) =0$ but how do we know it holds for all $t$ and not just $t=0$? Is it because $\psi’(t)$ doesn’t depend on $t$? – makansij Apr 29 '18 at 22:32
  • It seems like your secret to showing uniqueness is that $y$ can be any point in the entire domain because $y=x+tw$? – makansij Apr 29 '18 at 22:39
  • 1
    First question: almost similar the first part although you need to justify how convexity interplays there. Second question: $\psi(t)\geqslant 0$ for all $t$ since we are letting $z$ be a subgradient of the convex function. Third question: Since $\psi(0)=0$ and $\psi$ is identically nonnegative it means that $t=0$ is a minimum. Then simply first order condition for a minimum. And also $y$ is arbitrary means you can take arbitrary $w$. – Arian Apr 30 '18 at 13:20
  • Thanks @Arian. It’s making more sense. When you say “justify how convexity interplays there”. Is saying I understand that for a convex function, we have the following: $f(y) \le f(x)+\nabla f^T(y-x) \forall x,y \in dom , f$ not enough? If not, why not? Last thing: There must be some conditions on $t$ and $w$ in $y=x+tw$, right? $x+tw$ must still be in the domain of the function $f$, right? – makansij May 02 '18 at 23:53
1

First part of the proof is simple. I will prove the second part. Suppose $d$ is a vector of norm $1$ and $u$ is a subgradient of $f$ at $x$ which is not $c\nabla f(x)$ for some constant $c$ (in that case the proof is similar and simple).

By Taylor's theorem, we have:

$f(x + \epsilon d) = f(x) + \epsilon \langle\nabla f(x) , d\rangle + o(\epsilon) =\\ f(x) + \epsilon \langle\nabla f(x) , d\rangle + \epsilon \langle u , d\rangle - \epsilon \langle u , d\rangle + o(\epsilon)= \\ f(x) + \langle u,\epsilon d\rangle + \epsilon(\langle\nabla f(x) , d\rangle - \langle u,d\rangle + \frac{o(\epsilon)}{\epsilon})$

We can find a vector d such that $\langle\nabla f(x) , d\rangle < 0 $ and $\langle u,d\rangle > 0.$

Now by choosing a very small positive $\epsilon$, we have $f(x + \epsilon d) < f(x) + \langle u , \epsilon d\rangle$, which is a contradiction.