Lipschitz Hessian implies Lipschitz Hessian diagonal?

Question

I am working with an optimization problem where the Hessian is assumed Lipschitz, i.e.,

$$\| \nabla^2 f(x) - \nabla^2 f(y)\| \leq L\|x-y \|\tag{1}$$ and positive semi-definite. i.e., $\nabla^2 f(x) \succeq 0$. Let $\text{diag}\:\nabla^2 f(x)$ be the diagonal matrix whose elements are the diagonal elements of the Hessian $\nabla^2 f(x)$. That is,

$$\nabla^2 f(x) = \widehat{\nabla^2 f(x)} + \text{diag}\:\nabla^2 f(x)\tag{2}.$$

My goal is to prove that $\text{diag}\:\nabla^2 f(x)$ is also Lipschitz continuous.

To do so I have started by this post to get

$$\begin{aligned}z^T \nabla^2 f z \leq L z^Tz &\Leftrightarrow\\ z^T\widehat{\nabla^2 f} z + z^T\text{diag}\:\nabla^2 f z \leq L z^Tz &\Leftrightarrow \\ z^T\widehat{\nabla^2 f} z \leq z^T (L\:I -\text{diag}\:\nabla^2 f)z& \end{aligned}\tag{3}$$ Then by this post and the $\nabla^2 f\succeq 0$, we have $\text{diag}\:\nabla^2 f \succeq 0$. Combining the above we have $$z^T\widehat{\nabla^2 f} z \leq z^T (L\:I -\text{diag}\:\nabla^2 f)z \leq L z^Tz\tag{4}$$ which by setting $z=x-y$ and using again this post, we get

$$\| \widehat{\nabla^2 f(x)} - \widehat{\nabla^2 f(y)}\| \leq L\|x-y \|\tag{5}$$

Finally, using triangle inequality in $(1)$ and by applying $(5)$ we get

$$\begin{aligned} -\|\widehat{\nabla^2 f(x)} -\widehat{\nabla^2 f(y)} \| + \|\text{diag}\:\nabla^2 f(x) - \text{diag}\:\nabla^2 f(y) \|\leq L \|x -y \| &\Leftrightarrow \\ \|\text{diag}\:\nabla^2 f(x) - \text{diag}\:\nabla^2 f(y) \| \leq L \|x-y \| + \|\widehat{\nabla^2 f(x)} -\widehat{\nabla^2 f(y)} \| & \Leftrightarrow \\ \|\text{diag}\:\nabla^2 f(x) - \text{diag}\:\nabla^2 f(y) \| \leq 2 L \|x-y \|. \end{aligned}\tag{6}$$

Could you please someone verify if this procedure is correct? If it is correct can we make it tighter? Generally, in this case are we interested for a smaller Lipschitz constant or something that maximizes the upper bound? Any help is highly appreciated.

If you identify $M_n(\Bbb R)\cong\Bbb R^{n^2},$ doesn't this simply follow from the fact that $f:U\subseteq\Bbb R^n\to\Bbb R^m$ is Lipschitz continuous iff $f_i:U\subseteq\Bbb R^n\to\Bbb R,\forall i\in{1,\ldots,m}$ is Lipschitz continuous? — Matcha Latte, Jun 28 '22 at 13:01
I also think you don't need the other post as a reference for the positive semi-definitness because the diagonal entries of a positive semi-definite matrix are non-negative: $$a_{ii}=e_i^TAe_i\ge 0.$$ — Matcha Latte, Jun 28 '22 at 13:08
@Invisible is $2L$ a good Lipschitz constant or we can find something smaller? Generally, in this case are we interested for a smaller Lipschitz constant or something that maximizes the upper bound? Can you please give some advise? — Thoth, Jun 28 '22 at 13:16
@Invisible I am looking your first comment and I am trying to related with my problem. My assumption is $| \nabla^2 f(x) - \nabla^2 f(y)| \leq L|x-y |$. How can I correlate this with $f_i$ in order to conclude about $f$ which I suppose is the the diagonal of the Hessian? — Thoth, Jun 28 '22 at 13:39
If you want to prove that component $f_i:\Bbb R^n\to\Bbb R$ is Lipschitz $\forall i\in{1,\ldots,m}\implies$ the function $f:\Bbb R^n\to\Bbb R^m$ is Lipschitz, you would maximize $m$ constants $L_i$ and use the $|\cdot|\infty$ norm. If you wanted to prove that $f:\Bbb R^m\to\Bbb R^n$ is Lipschitz $\implies f_i:\Bbb R^n\to\Bbb R$ is Lipschitz $\forall i\in{1,\ldots,m},$ you would use the fact that $$\begin{aligned}&\color{white}=|f_j(x)-f_j(y)|\&=\sqrt{(f_j(x)-f_j(y))^2}\&\le\sqrt{\sum{i=1}^m(f_i(x)-f_i(y))^2}\&=|f(x)-f(y)|_2\end{aligned}$$ — Matcha Latte, Jun 28 '22 at 13:44
In your case, $\nabla^2f:\Bbb R^n\to\Bbb R^{n^2}$ is Lipschitz $\iff (\nabla^2 f)_{ij}:\Bbb R^n\to\Bbb R$ is Lipschitz $\forall i,j\in{1,\ldots,n}.$ — Matcha Latte, Jun 28 '22 at 13:51
@Invisible thanks for the response! I am thinking how your comment with $| f_j (x) - f_i (x)| \leq | f(x) -f(y)|_2$ can be adjusted in the matrix case. If we write
$$| [\nabla^2 f (x)]{ij} -[\nabla^2 f (y)]{ij} | \leq \sqrt{ \sum_{kl} ([\nabla^2 f (x)]{kl} -[\nabla^2 f (y)]{kl})^2} = | \nabla^2 f(x) - \nabla^2 f(y) |F$$ how can we use $$ | \nabla^2 f(x) - \nabla^2 f(y) |_2 \leq L |x-y|$$ to get $$| [\nabla^2 f (x)]{ij} -[\nabla^2 f (y)]_{ij} | \leq | x-y|$$ as $|A|_2 \leq |A|_F$? — Thoth, Jun 28 '22 at 14:07
Well, think of $F(x):=(\nabla^2f_{11}(x),\ldots,\nabla^2f_{1n}(x),\ldots,\nabla^2f_{i1}(x),\ldots,\nabla^2f_{in}(x),\ldots,\nabla^2f_{n1}(x),\ldots,\nabla^2f_{nn}(x)),$ i. e., put all the entries of the Hessian into one row/column. Then $|F(x)|_2=|\nabla^2f(x)|_F.$ And work with $F:\Bbb R^n\to\Bbb R^{n^2}$ instead. — Matcha Latte, Jun 28 '22 at 14:17
But again how we will use $| \nabla^2 f(x) - \nabla^2 f(y) |_2 \leq L |x-y|$ which refers to the spectral norm and not in the $\ell_2$ vector norm? Could you please provide some additional information ? — Thoth, Jun 28 '22 at 14:28
Why do you insist on that norm? On finite dimensional space, all norms are equivalent. Choose the one that is most convenient. The constant $L$ might just be different, but that's irrelevant. — Matcha Latte, Jun 28 '22 at 14:37
Sorry I have not used before this trick.
So to be sure, instead of using $| \nabla^2 f(x) - \nabla^2 f(y) |_2 \leq L |x-y|$, we can use use $|F(x) -F(y)|_2 \leq L' |x-y|_2 $ in order to prove $|F_i(x) -F_j(y)|_2 \leq L' |x-y|_2 $ which by restricting it to the diagonal elements of $\nabla^2f$ we prove $|\text{diag}:\nabla^2 f(x) - \text{diag}:\nabla^2 f(y) | \leq L' |x-y |$ by using your comment:

$f_i:\Bbb R^n\to\Bbb R$ is Lipschitz $\forall i\in{1,\ldots,m}\implies$ $f:\Bbb R^n\to\Bbb R^m$ is Lipschitz? — Thoth, Jun 28 '22 at 15:18
@Invisible Hi I am looking you comments again after a year and I have collected then into a post below. I have also a question in the end of the post related to the isometry property. Could you please evaluate the post and provide some more details about my question? Any help is highly appreciated. — Thoth, Aug 24 '23 at 08:28
You don't map something from $M_n(\Bbb R)$ to $\Bbb R^{n\times n}.$ I just mentioned that so that you would use another, more suitable norm. Again , on finite dimensional spaces, all norms are equivalent. Why do you keep insisting on using something that just makes life more complicated? (=: — Matcha Latte, Aug 25 '23 at 18:14

Thoth · Answer 1 · 2023-08-24T08:31:49.523

I am looking you comments again after a year. Please let me concentrate them in this post hopping to understand better what you have written in you comments and make a question in the end.

Here are the facts. The first comment fact is

If $f:\Bbb R^n\to\Bbb R^m$ is Lipschitz continuous then $f_i:\Bbb R^n\to\Bbb R$, for $i=1,2,\dots, m$, is also Lipschitz continuous.$\tag{A}$

The second comment fact is

If $f_i:\Bbb R^n\to\Bbb R$, for $i=1,2,\dots, m$, is Lipschitz continuous then $f:\Bbb R^n\to\Bbb R^m$ is also Lipschitz continuous. $\tag{B}$

The third comment fact is

$F(x):=(\nabla^2f_{11}(x),\ldots,\nabla^2f_{1n}(x),\ldots,\nabla^2f_{i1}(x),\ldots,\nabla^2f_{in}(x),\ldots,\nabla^2f_{n1}(x),\ldots,\nabla^2f_{nn}(x))$ $\tag{C}$

The forth comment fact is

$M_n(\Bbb R)\cong\Bbb R^{n^2} \tag{D}$

this will take us from the matrix space to vector space and back.

The fifth fact is

$$\| \nabla^2 f(x) - \nabla^2 f(y)\| \leq L\|x-y \| \tag{E}$$

As far as I understand form your comments, using (E) and (D), we can state that $F(x)$ is Lipschitz continuous. Then, using (A), we can state that

$$||\nabla^2f_{ij}(x) - \nabla^2f_{ij}(y)|| \leq L_{ij}||x-y|| \tag{F}$$

Then, restricting into the case of $i=j$, we have

$$||\nabla^2f_{ii}(x) - \nabla^2f (y)|| \leq L_{ii}||x-y|| \tag{G}$$

and using (B), we can go to

$$||\text{diag} \nabla^2f(x) - \text{diag} \nabla^2f(y)|| \leq L'||x-y|| \tag{H}$$

where $\text{diag} \nabla^2f(x)$ is a $m \times m$ diagonal matrix whose elements are the diagonal elements of $\nabla^2f(x)$.

My question:

I gathered all the information together to enhance my understanding of the process, particularly focusing on point (D). As I comprehend it, (D) serves as a bridging element that facilitates the transition between matrix space and vector space, while preserving the Lipschitz continuity characteristic. I'm uncertain about how (D) ensures the transfer of the Lipschitz continuity property between these spaces. Could you provide further insight on how (D) maintains Lipschitz continuity during the transition?

Lipschitz Hessian implies Lipschitz Hessian diagonal?

1 Answers1

Linked