0

Given a function a convex and smooth function $ f \left( x \right) : \mathbb{R}^{n} \to \mathbb{R} $ with its gradient having the form:

$$ \nabla f \left( x \right) = x - g \left( x \right) $$

Where $ g \left( x \right) : \mathbb{R}^{n} \to \mathbb{R}^{n} $ is also smooth (Each component).

One way to find a stationary point of $ f \left( x \right) $ is to find the Fixed Point (Points) of $ g \left( x \right) $ as it will vanish the gradient and hence must be a stationary point.

Does the case above guarantees the convergence of $ g \left( x \right) $? If not, what constraints should be imposed to guarantee convergence?
I had thought so as it is guaranteed that the Newton Method to work on $ f \left( x \right) $ yet I am not sure about it.

I read about the convergence of Fixed Point Iteration in Elementary Numerical Analysis MATH:3800/CS:3700 Notes, Section 3.4. Yet they are for general functions. I wonder if there is an analysis for the case above.

Royi
  • 10,050

2 Answers2

1

We need another condition $\nabla g(x) \succ -I$ and here function $f$ is strictly convex.

Or the condition $\nabla g(x)^T \nabla g(x) \preceq \rho$, where $\rho < 1$.

From the strictly convexity, we know that \begin{equation} \nabla g(x) \prec I \end{equation} will be held for any $x$, where $I$ is identity matrix. And due to $\nabla g(x) \succ I$, we can obtain \begin{equation} \nabla g(x)^T \nabla g(x) \prec I. \end{equation}

According to the Contraction Mapping Theorem, what we need is only $\|\nabla g(x)\| < 1$. Based on definition of matrix norm, we have \begin{equation} \|\nabla g(x)\| = \sup_{\|y\| = 1} \|\nabla g(x) y\| = \sup_{\|y\|=1} \sqrt{y^T\nabla g(x)^T \nabla g(x) y} < \sup_{\|y\|=1} \sqrt{y^Ty}=1. \end{equation}

By the way, actually, we can consider the sequence $x, g(x), g(g(x)), \dots$ as a special case of gradient descent. Let the step size of GD be $t=1$. So we have \begin{equation} x^{k+1} = x^k - (x^k - g(x^k))=g(x^k). \end{equation} To keep the convergence, we need the step size to satisfy $t < 2/L$, and $\nabla^2 f(x) \preceq L I$. Thus $L < 2$, i.e., \begin{equation} \nabla^2 f(x) = I - \nabla g(x) \prec 2I. \end{equation} Thus $\nabla^2 g(x) \succ -I$.

Zenan Li
  • 1,434
  • It seems that one can directly obtain $|\nabla g(x)| \leq 1$ based on $|\nabla g(x)| = \sigma_{max} (\nabla g(x))$... – Zenan Li Mar 07 '20 at 02:59
  • Yes, I think so. – Zenan Li Mar 07 '20 at 06:01
  • Since the function $f$ is smooth and convex, its Hessian exists and is positive semi-definite. – Zenan Li Mar 07 '20 at 06:56
  • I did not understand what you mean. The hessian matrix is $\nabla^2 f(x) = I-\nabla g(x)$, right? More exact, here $\nabla g(x)$ is Jacobian matrix of $g$. – Zenan Li Mar 07 '20 at 07:12
  • This is similar to what I got. The problem is I seem to have an example which contradicts it. Yet I'm missing why. – Royi Mar 07 '20 at 07:26
  • Hi @Royi, I have edited my answer, and I think it is correct this time. – Zenan Li Mar 07 '20 at 07:48
  • If we don't have $\nabla g(x) \succeq -I$, we could not guarantee that $\nabla g(x)^T \nabla g(x) \preceq I$. For example, we can choose $g(x) = -2I$ as a constant matrix function. – Zenan Li Mar 07 '20 at 08:01
  • @Zen-NanLi, Have you seen my message in our chat? – Royi Mar 08 '20 at 12:44
  • @Royi, Hi, I guess that you are working with this problem [Proximal / Prox Operator for the Logistic Function], right? In this problem, the gradient of $g(x)$ is $\nabla g(x) = t(x) \cdot cc^T$, where $t(x)$ is a scalar function in terms of $x$. And obviously $cc^T$ is PSD and its rank is $1$. In addition, the unique non-zero eigenvalue is $c^Tc$, thus I think the answer is easy? – Zenan Li Mar 08 '20 at 16:49
  • See what I wrote in the chat. By the way $ t \left( x \right) = \lambda \frac{ -{e}^{ {c}^{T} x } }{ {\left( 1 + {e}^{ {c}^{T} x } \right)}^{2} } $. – Royi Mar 08 '20 at 18:10
  • About the Eigen Value, it is given by $ t \left( x \right) {c}^{T} c $. – Royi Mar 08 '20 at 18:15
1

Remark: This is the result of a joint discussion with @Ze-NanLi.

The conditions for $ g \left( x \right) $ to converge in Fixed Point Iterations are:

  1. $ g $ has a fixed point in a domain $ \mathcal{D} \subseteq \mathbb{R}^{n} $ if $ g : \mathcal{D} \to \mathcal{D} $. Namely in case of $ \mathcal{D} = \mathbb{R}^{n} $ it must not have $ {x}_{0} \in \mathbb{R}^{n} $ such that $ g \left( {x}_{0} \right) = \infty $.
  2. The Jacobian of $ g $ must obey $ {\nabla g \left( x \right)}^{T} \nabla g \left( x \right) \preceq \rho I $ where $ \rho < 1 $ (Equivalent, $ {\left\| \nabla g \left( x \right) \right\|}_{2} \leq \rho, \; x \in \mathcal{D} $).

The motivation for (2) is:

$$ \left\| \nabla g \left( x \right) \right\| = \sup_{ \left\| y \right\| = 1} \left\| \nabla g \left( x \right) y \right\| = \sup_{ \left\| y \right\| = 1} \sqrt{ {y}^{T} {\nabla g \left( x \right)}^{T} \nabla g \left( x \right) y} < \sup_{ \left\| y \right\| = 1} \sqrt{ {y}^{T} y } = 1 $$

Which is the condition for unique Fixed Point in the range.

Now, 2 conditions to satisfy (2) are:

  1. $ \nabla g \left( x \right) \prec I $.
  2. $ \nabla g \left( x \right) \succ -I $.

Since $ f \left( x \right) $ is convex it suggests that $ {H}_{ f \left( \cdot \right) } \left( x \right) \succeq 0 $ where $ {H}_{ f \left( \cdot \right) } $ is the Hessian Matrix of $ f \left( \cdot \right) $. Since:

$$ {H}_{ f \left( \cdot \right) } \left( x \right) = I - \nabla g \left( x \right) \succeq 0 \Rightarrow \nabla g \left( x \right) \preceq I $$

So if we want strict inequality one must demand $ f \left( \cdot \right) $ to be strictly convex.

So, in order to guarantee $ g \left( x \right) $ will converge to Fixed Point one must demand the following:

  1. The function $ f \left( x \right) $ must be strictly convex.
  2. The function $ g \left( x \right) $ must obey $ \nabla g \left( x \right) \succ -I $.

Alternative Derivation of the Condition $ \nabla g \left( x \right) \succ -I $

Similarly to the analysis the Newton Method as Fixed Point Iteration one could write:

$$ {x}^{k + 1} = {x}^{k} - \left( {x}^{k} - g \left( {x}^{k} \right) \right) $$

Yet by definition $ \nabla f \left( {x}^{k} \right) = {x}^{k} - g \left( {x}^{k} \right) $ so we have Gradient Descent with Step Size of 1.

For a function $ f \left( x \right) $ which is $ L $ Lipschitz Continuous Gradient means:

  1. The Hessian of the function obeys $ {H}_{f} \left( x \right) \preceq L I $.
  2. The gradient descent will converge for step size $ \alpha \leq \frac{2}{L} $.

From the above we have $ 1 \leq \frac{2}{L} \Rightarrow L \leq 2 $ and $ I - \nabla g \left( x \right) \preceq L I \preceq 2 I \Rightarrow \nabla g \left( x \right) \succeq -I $ as needed.

Royi
  • 10,050