10

Let $\operatorname{Psym}_n$ be the cone of symmetric positive-definite matrices of size $n \times n$.

How to prove the positive square root function $\sqrt{\cdot}:\operatorname{Psym}_n \to \operatorname{Psym}_n$ is uniformly continuous?

I am quite sure this is true, since on any compact ball this clearly holds, and far enough from the origin, I think the rate of change should decrease (analogous to the one-dimensional case where $(\sqrt{x})'=\frac{1}{2\sqrt{x}}$ tends to zero when $x \to \infty$).

A naive approach is to try to use the mean value inequality:

For that we need to show the norm $\|d(\sqrt{\cdot})_A\|$ is bounded for $\|A\|$ large enough. We know the derivative satisfies:

$$d(\sqrt{\cdot})_A(B) \cdot \sqrt{A} + \sqrt{A} \cdot (\sqrt{\cdot})_A(B)=B$$ for every $B \in \operatorname{sym}_n$.

Thus,

$$ \|B\| \le \| d(\sqrt{\cdot})_A(B) \cdot \sqrt{A}\| +\| \sqrt{A} \cdot (\sqrt{\cdot})_A(B) \|\le 2 \| \sqrt{A}\| \| d(\sqrt{\cdot})_A(B)\|,$$

so we only get a bound from below: $$\|d(\sqrt{\cdot})_A\|_{op} \ge \frac{1}{2\|\sqrt{A}\|}$$

Asaf Shachar
  • 25,967

3 Answers3

10

The cheapest way is to write some representation of the square root in terms of functions whose continuity is obvious. Note that the square root is homogeneous of degree $1/2$, so it suffices to show that if $\|A-B\|\le 1$, then $\|A^{1/2}-B^{1/2}\|\le C$. Now consider the function $$ f(x)=\int_0^1\left[1-\frac1{1+tx}\right]t^{-3/2}\,dt $$ Making the obvious change of variable $tx=s$, we get $$ f(x)=x^{1/2}\int_0^x\frac{s}{1+s}s^{-3/2}\,ds=Kx^{1/2}-x^{1/2}\int_x^\infty\frac{s}{1+s}s^{-3/2}\,ds=Kx^{1/2}+g(x)\,. $$ Note that $|g(x)|\le 2$ for all $x>0$. Thus, $$ \|KA^{1/2}-f(A)\|\le 2 $$ for an arbitrary positive definite self-adjoint $A$. Now it will suffice to show that $f$ is "operator Lipschitz", but that is obvious since $$ f(A)-f(B)=\int_0^1(1+tA)^{-1}(B-A)(1+tB)^{-1}t^{-1/2}\,dt $$ and $\|(1+tX)^{-1}\|\le 1$ for any positive definite self-adjoint $X$ and $t\ge 0$. (The resolvent identity $X^{-1}-Y^{-1}=X^{-1}(Y-X)Y^{-1}$ has been used here, of course).

In fact, we can say much more: every $\alpha$-Holder continuous function $F$ is operator Holder continuous ($0<\alpha<1$) on the space of self-adjoint matrices. The Lipschitz case is more subtle, however, and is not fully resolved yet. I know neither how curious you are about all this stuff, nor how much you know already, so I'm stopping here.

fedja
  • 19,348
  • But $1+tA$ or $1+tB$ might be non-invertible ? What then ? – Ewan Delanoy Sep 25 '16 at 07:16
  • 1
    @EwanDelanoy Since $A,B$ are symmetric positive definite, $1+tA$ is symmetric positive definite for eevry $t>0$, hence invertible. – Asaf Shachar Sep 25 '16 at 13:12
  • @fedja: Thanks. Nice to meet you again:) (1) I am very interested in these stuff. I know too little about "functional calculus of operators" and I would like to know more. Do you have recommended references (for the general theory and specifics on Lipschitz? (In a quick google search for specific results about Lipschitz I have found this : http://arxiv.org/abs/1602.07994, unfortunately I do not read Russian). (2) I guess the extensions of functions to operators you are implicitly relying on - ... – Asaf Shachar Sep 25 '16 at 13:22
  • @fedja is via defining first on diagonal matrices (elementwise) and then extend to all symmetric matrices via requiring invariance under orthogonal diagonalization. (3) Do you have an easy argument for why $|g(x)| \le 2$? – Asaf Shachar Sep 25 '16 at 13:23
  • @AsafShachar (1) You can find many more articles by the same authors (Alexandrov, Peller) on arXiv, most of them in English. How readable they might be for you depends on your background. (2) Yes, of course. That is the standard way to define a function of a self-adjoint operator. (3) Just drop the factor $\frac s{s+1}$, which is less than $1$ anyway, and integrate $s^{-3/2}$. – fedja Sep 25 '16 at 13:55
  • Thanks, by the way, how did you come up with this "magic" function $f$? Is there some more general technique behind this construction? – Asaf Shachar Sep 25 '16 at 15:11
  • @fedja Your solution is very nice, thanks. By the way I am still wondering whether or not $|d(\sqrt{\cdot})A|{op} \le \frac{1}{2|\sqrt{A}|}$ (or at least $|d(\sqrt{\cdot})A|{op} \le C \frac{1}{2|\sqrt{A}|}$ for some constant $C$). This will enable us using my "naive" approach (which is based on the mean value inequality). – Asaf Shachar Sep 26 '16 at 15:23
  • @AsafShachar Not with the norm in the denominator. You cannot have it better than for the restriction to a subspace, so you can get $\frac 12|A^{-1/2}|$ at best but that is useless. – fedja Sep 26 '16 at 23:25
  • On which subspace did you think? (For the obvious try, I took $B=A$, and got $d(\sqrt{\cdot})_A(A)=\frac{1}{2}\sqrt{A}$). – Asaf Shachar Sep 27 '16 at 07:49
  • @AsafShachar The one corresponding to the eigenvector.with the minimal eigenvalue – fedja Sep 27 '16 at 12:23
  • Can you please say what is the relevant subspace? I do not see why the minimal eigenvalue is $\frac{1}{2} | A^{-\frac{1}{2}} |$... – Asaf Shachar Oct 02 '16 at 12:48
6

If vector $x$ with $\|x\|=1$ is an eigenvector of $\sqrt A - \sqrt B$ with eigenvalue $\mu$ then \begin{align*} x^T(A-B)x &= x^T(\sqrt A - \sqrt B)\sqrt A x + x^T\sqrt B (\sqrt A - \sqrt B) x \\&= \mu x^T(\sqrt A + \sqrt B)x. \end{align*} Now for any $\epsilon>0$, if $\|A-B\|_{op}\le \delta=\epsilon^2$, choose $\mu=\pm\|\sqrt A - \sqrt B\|_{op}$ and as in the scalar case, \begin{align*} \|\sqrt A - \sqrt B\|_{op}^2 & = (x^T(\sqrt A - \sqrt B)x)^2 \\& \le |x^T(\sqrt A - \sqrt B)x| ~x^T(\sqrt A + \sqrt B)x \\&=|x^T(A-B)x| \\&\le \delta = \epsilon^2. \end{align*} The same argument also leads to a simple proof that the matrix square root is Lipschitz: https://math.stackexchange.com/a/3968118/484640

jlewk
  • 2,257
  • 2
    Surprised that no one is noticing this gem. Simple & elementary proof. I got stuck on this problem for a few weeks and there is another post on a related (if not the same) question: https://math.stackexchange.com/questions/3360914/concavity-inequality-for-the-matrix-square-root. All complicated proofs that are hard to parse, with unknown dimension-dependent constants. How come we all missed this... – Nan Jiang Jan 05 '24 at 16:06
  • 1
    I later learned that this trick was already known and generalized extensively in Ando. J. L. van Hemmen. "An inequality for trace ideals." Comm. Math. Phys. 76 (2) 143 - 148, 1980. https://projecteuclid.org/journals/communications-in-mathematical-physics/volume-76/issue-2/An-inequality-for-trace-ideals/cmp/1103908255.full?tab=ArticleLink – jlewk Jan 06 '24 at 02:16
  • So NICE!! Thank you so much. – Jie Wei Sep 13 '24 at 09:23
4

$\newcommand{\id}{\operatorname{Id}}$

This is merely a more detailed version of fedja's answer:

Lemma 1:

Let $f$ be a real function defined on the positive reals. Assume $|f(x)| \le C$ for every $x >0$. Then $\|f(A)\|_{op} \le C$ for every $A \in \operatorname{Psym}_n$.

Proof:

First, we note that $f$ can be extended to the cone of symmetric positive definite matrices, since their eigenvalues are strictly positive. It is enough to prove the statement for diagonal positive definite matrices:

Let $A=\operatorname{diag}(\sigma_1,...,\sigma_n)$. Then:

$$ f(A)=\operatorname{diag}(f(\sigma_1),...,f(\sigma_n)),$$ thus

$$ \|f(A)\|_{op} = \max(|f(\sigma_i)|) \le C.$$

Lemma 2

It is enough to prove that $\|A-B\|_{op} \le 1 \Rightarrow \|A^{1/2}-B^{1/2}\|_{op}\le C$.

Proof:

Indeed, let $A,B \in \operatorname{Psym}_n$. Define $\lambda=\|A-B\|$, and let $\tilde A = \frac{1}{\lambda}A, \tilde B= \frac{1}{\lambda}B$. Note that $\sqrt{\tilde A}=\frac{1}{\sqrt \lambda} \sqrt{A},\sqrt{\tilde B}=\frac{1}{\sqrt \lambda} \sqrt{B}$, and that $\|\tilde A- \tilde B\|=1$. Thus, by our assumption,

$$ \frac{1}{\sqrt{\lambda}}\|A^{1/2}-B^{1/2}\| = \|\tilde A^{1/2}-\tilde B^{1/2}\|\le C$$

Thus, $$ \|A^{1/2}-B^{1/2}\| \le C \|A-B\|^{\frac{1}{2}}$$

So, the matrix square root is $\frac{1}{2}$-Holder on $\operatorname{Psym}_n$ and in particular uniformly continuous.


$$ f(x)=\int_0^1\left[1-\frac1{1+tx}\right]t^{-3/2}\,dt $$ Making the obvious change of variable $tx=s$, we get $$ f(x)=x^{1/2}\int_0^x\frac{s}{1+s}s^{-3/2}\,ds=x^{1/2}(\int_0^\infty\frac{s}{1+s}s^{-3/2}\,ds-\int_x^\infty\frac{s}{1+s}s^{-3/2}\,ds)=Kx^{1/2}+g(x)\,. $$

where $K= \int_0^\infty\frac{s}{1+s}s^{-3/2}\,ds=\int_0^1\frac{s}{1+s}s^{-3/2}\,ds+\int_1^\infty\frac{s}{1+s}s^{-3/2}\,ds$

We already know that the first expression is finite, and the second is not greater than $\int_1^\infty s^{-3/2}\,ds < \infty$. Thus, $K < \infty$.

Since $g(x)=-x^\frac{1}{2}\int_x^\infty\frac{s}{1+s}s^{-3/2}\,ds$, $$|g(x)|\le x^\frac{1}{2}\int_x^\infty s^{-3/2}\,ds =2$$ for all $x>0$. Thus, by Lemma 1 $$ (**) \, \, \|KA^{1/2}-f(A)\|_{op}=\|g(A)\|_{op}\le 2 $$ for an arbitrary positive definite self-adjoint $A$. Now it will suffice to show that $f$ is "operator Lipschitz", i.e $\|f(A)-f(B)\| \le \tilde C\|A-B\|_{op}$.

Indeed, this would imply

$$ \|A^{\frac{1}{2}}-B^{\frac{1}{2}}\|_{op}\le \|A^{\frac{1}{2}}-\frac{1}{K}f(A)\|_{op} + \|\frac{1}{K}f(A)-\frac{1}{K}f(B)\|_{op} + \|\frac{1}{K}f(B)-B^{\frac{1}{2}}\|_{op} $$ $$ \le \frac{4}{K}+\frac{\tilde C}{K} \|A-B\|_{op}.$$

The last inequality holds for any $A,B \in \operatorname{Psym}_n$. Assuming $\|A-B\|_{op} \le 1$, it becomes:

$$ \|A^{\frac{1}{2}}-B^{\frac{1}{2}}\|_{op}\le \frac{4}{K}+\frac{\tilde C}{K}:=C $$

This finishes the proof, according to lemma 2.

We now turn to prove Lipschitzity of $f$:

First, note that integration and matrix operation commute. Thus,

$$ f(A)=\int_0^1 \id-(\id+tA)^{-1}t^{-3/2}\,dt,$$ so $$ f(A)-f(B)=\int_0^1 \left[(\id+tB)^{-1}-(\id+tA)^{-1}\right]t^{-3/2}\,dt=\int_0^1(\id+tB)^{-1}(A-B)(\id+tA)^{-1}t^{-1/2}\,dt $$

(where in the last passage we have used the resolvent identity $X^{-1}-Y^{-1}=X^{-1}(Y-X)Y^{-1}$).

Finally, we get

$$ \|f(A)-f(B)\|_{op} =\| \int_0^1(\id+tB)^{-1}(A-B)(\id+tA)^{-1}t^{-1/2}\,dt \|_{op} $$ $$\le \int_0^1 \|(\id+tB)^{-1}(A-B)(\id+tA)^{-1}t^{-1/2}\|_{op}\,dt$$ $$ \le \int_0^1 \|(\id+tB)^{-1}\|_{op}\|A-B\|_{op}\|(\id+tA)^{-1}\|_{op}t^{-1/2}\,dt \le \|A-B\|_{op} \int_0^1 t^{-1/2} \, dt =2\|A-B\|_{op} $$

(since $\|(1+tX)^{-1}\|\le 1$ for any positive definite self-adjoint $X$ and $t\ge 0$. This can be proved easily for diagonal matrices, and then using orthogonal diagonalization to all positive matrices).

Asaf Shachar
  • 25,967