Remark: This is the result of a joint discussion with @Ze-NanLi.
The conditions for $ g \left( x \right) $ to converge in Fixed Point Iterations are:
- $ g $ has a fixed point in a domain $ \mathcal{D} \subseteq \mathbb{R}^{n} $ if $ g : \mathcal{D} \to \mathcal{D} $. Namely in case of $ \mathcal{D} = \mathbb{R}^{n} $ it must not have $ {x}_{0} \in \mathbb{R}^{n} $ such that $ g \left( {x}_{0} \right) = \infty $.
- The Jacobian of $ g $ must obey $ {\nabla g \left( x \right)}^{T} \nabla g \left( x \right) \preceq \rho I $ where $ \rho < 1 $ (Equivalent, $ {\left\| \nabla g \left( x \right) \right\|}_{2} \leq \rho, \; x \in \mathcal{D} $).
The motivation for (2) is:
$$ \left\| \nabla g \left( x \right) \right\| = \sup_{ \left\| y \right\| = 1} \left\| \nabla g \left( x \right) y \right\| = \sup_{ \left\| y \right\| = 1} \sqrt{ {y}^{T} {\nabla g \left( x \right)}^{T} \nabla g \left( x \right) y} < \sup_{ \left\| y \right\| = 1} \sqrt{ {y}^{T} y } = 1 $$
Which is the condition for unique Fixed Point in the range.
Now, 2 conditions to satisfy (2) are:
- $ \nabla g \left( x \right) \prec I $.
- $ \nabla g \left( x \right) \succ -I $.
Since $ f \left( x \right) $ is convex it suggests that $ {H}_{ f \left( \cdot \right) } \left( x \right) \succeq 0 $ where $ {H}_{ f \left( \cdot \right) } $ is the Hessian Matrix of $ f \left( \cdot \right) $. Since:
$$ {H}_{ f \left( \cdot \right) } \left( x \right) = I - \nabla g \left( x \right) \succeq 0 \Rightarrow \nabla g \left( x \right) \preceq I $$
So if we want strict inequality one must demand $ f \left( \cdot \right) $ to be strictly convex.
So, in order to guarantee $ g \left( x \right) $ will converge to Fixed Point one must demand the following:
- The function $ f \left( x \right) $ must be strictly convex.
- The function $ g \left( x \right) $ must obey $ \nabla g \left( x \right) \succ -I $.
Alternative Derivation of the Condition $ \nabla g \left( x \right) \succ -I $
Similarly to the analysis the Newton Method as Fixed Point Iteration one could write:
$$ {x}^{k + 1} = {x}^{k} - \left( {x}^{k} - g \left( {x}^{k} \right) \right) $$
Yet by definition $ \nabla f \left( {x}^{k} \right) = {x}^{k} - g \left( {x}^{k} \right) $ so we have Gradient Descent with Step Size of 1.
For a function $ f \left( x \right) $ which is $ L $ Lipschitz Continuous Gradient means:
- The Hessian of the function obeys $ {H}_{f} \left( x \right) \preceq L I $.
- The gradient descent will converge for step size $ \alpha \leq \frac{2}{L} $.
From the above we have $ 1 \leq \frac{2}{L} \Rightarrow L \leq 2 $ and $ I - \nabla g \left( x \right) \preceq L I \preceq 2 I \Rightarrow \nabla g \left( x \right) \succeq -I $ as needed.