1

Let $f : \mathbb{R} \rightarrow \mathbb{R}$ be a differentiable function. Given the following two definitions of convexity of $f$, prove that (i) implies (ii):

(i) $\forall x, y \in \mathbb{R} : f(x) \ge f(y) + f'(y)(x - y)$

(ii) $\forall x, y \in \mathbb{R}, \forall \lambda \in [0, 1] : f(\lambda x + (1 - \lambda)y) \le \lambda f(x) + (1 - \lambda)f(y)$


First I saw that (i) is: $$f'(y) \leq \frac{f(x)-f(y)}{x-y} \,\,\, (*)$$ So the slope from $y$ to a greater point $x$ is always greater than the slope of only $y$.

I have tried writing $f(\lambda x + (1 - \lambda)y) = f(y+\lambda(y-x))$ as $f(y)$ plus, so to speak, the sum of all $f'(y + \epsilon \cdot n) \cdot \epsilon$, where $\epsilon \rightarrow 0$ and $n$ needs to be defined correctly of course. So just the "starting-point" $f(y)$ and then every point with it's slope until we reach $f(y+\lambda(y-x))$. That slope, I would estimate using $(*)$ and get an inequality. But this doesn't work since I would need to use (*) on the actual points $x$ and $y$, but that doesn't work of course.


I can't come up with a different attempt and the other questions on the internet don't have a derivative in them. They are all different definitions of convexity than the ones here.

  • in (i) set $y:= \lambda \hat x+ (1-\lambda) \hat y$ and $x:=\hat x, \hat y$, where $\hat x$, $\hat y$ referes to points in property (ii) – daw May 27 '24 at 15:08
  • honestly, this was certainly asked before on this site. I could not find a duplicate, though. – daw May 27 '24 at 15:08
  • @daw I don't understand. Set $x$ to x^? Or to y^? Or both? I tried both and I just get a long inequality which does not me lead to anything useful – mathematics-and-caffeine May 27 '24 at 15:39

1 Answers1

1

We want to show that i) implies ii). We assume that ii) does not hold true. It exists then two points $a, b$ and a scalar $\mu \in (0, 1)$ such that $$ f(x_\mu) > \mu f(a) +(1 -\mu)f(b) $$ with $x_\mu = \mu a +(1 -\mu) b$. We want to show that this is impossible. We first apply i) to $a$ and $x_\mu$ and then to $b$ and $x_\mu$. We have $$\label{eq:1} f(a) \geq f(x_\mu) +f'(x_\mu)(a -x_\mu) = f(x_\mu) +(1 -\mu)f'(x_\mu)(a -b) \tag{$\ast$} $$ and $$\label{eq:2} f(b) \geq f(x_\mu) +f'(x_\mu)(b -x_\mu) = f(x_\mu) -\mu f'(x_\mu)(a -b) \tag{$\ast\ast$}. $$ We multiply \eqref{eq:1} by $\mu$ and \eqref{eq:2} by $1 -\mu$ and then take their sum. We obtain $$ \mu f(a) +(1 -\mu)f(b) \geq f(x_\mu) $$ which contradicts the initial assumption.

Daniel N
  • 298
  • I don't understand how we obtain $()$. Since $b-x_{\mu}$ is negative, we actually would substract something. By replacing it with the derivative, we substract something smaller than before, so the result would be bigger. In $(*)$ we have everything $\geq 0$, so that works, but in $()$ I don't see it working! – mathematics-and-caffeine May 28 '24 at 10:16
  • Being positive or negative does not count; it's a computation. You have $b -x_\mu = b -\mu a -(1 -\mu)b = \mu (b -a)$. – Daniel N May 28 '24 at 18:43