Understanding Kantorovich's inequality

Question

I'm looking for a proof of the Kantorovich inequality, namely:

$$ \langle Ax,x\rangle \langle A^{-1}x,x\rangle \leq \frac{1}{4}\left(\kappa(A)+\frac{1}{\kappa(A)}\right)$$

where $ \kappa (A) = \lVert A\rVert_2 \lVert A^{-1}\rVert_2 $ is the condition number of the Hermitian positive definite matrix $A$ and $x$ a vector with the proper size. Or alternatively

$$ \langle Ax,x\rangle \langle A^{-1}x,x\rangle \leq \frac{1}{4}\left(\left(\frac{\beta}{\alpha}\right)^{2}+\left(\frac{\alpha}{\beta}\right)^{2}\right)$$ where $0 < \alpha = \lambda_1 \leq \cdots \leq \lambda_n = \beta$ are the eigenvalues of $ A$.

There are a lot of proofs on internet but this is the one that I found easier to understand. Nevertheless, I'm stuck figuring out something:

They use $ f,g: [\alpha,\beta]\to \mathbb{R} $ two convex function with $ f$ positive and $f(t)\leq g(t) $ for every $t \in [\alpha, \beta] $. Then they claim that $ F=f(A)$ and $ G=g(A)$ are well defined and hermitian positive definite.

I don't understand what is meant by $``f(A)"$.

If any of you guys can suggest me alternative documentation (other proof) or help me with this one, I would be very grateful

$F = f(A)$ is the matrix that is obtained by expandind $f(t)$ as a Taylor series (an "infinite polynomial") and then substituting $A$ for $t$ in that expansion. — NickD, Apr 04 '18 at 20:13
@Nick thank you but how does that implies that $ F$ and $G$ are hermitian positive definite — Natalio, Apr 04 '18 at 20:16
If $\lambda$ is an eigenvalue of $A$ then $f(\lambda)$ is an eigenvalue of $f(A)$. I think that's enough to push through the proof, given the conditions. — NickD, Apr 04 '18 at 20:39

Pietro Paparella · Accepted Answer · 2023-02-03T03:47:07.753

If $A$ is diagonalizable, i.e., if $A=SDS^{-1}$, with $D= \text{diag}(\lambda_1,\dots,\lambda_n)$, and if $f:\mathbb{R} \longrightarrow \mathbb{R}$, then $$ f(A) := S f(D) S^{-1}, $$ where $f(D):= \text{diag}(f(\lambda_1),\dots,f(\lambda_n))$.

In the problem above, since every Hermitian matrix is diagonalizable (in fact, because $A$ is Hermitian, there is a unitary matrix $U$ such that $U^*AU$ is diagonal), it follows that $f(A)$ is well-defined and positive definite because $f$ is positive.

Mittens · Answer 2 · 2021-07-16T03:25:42.500

This is an old posting but I think the following simple (probabilistic) proof presents a more general inequality from which the one in the OP follows.

Theorem: Let $(\Omega,\mathscr{F},\mathbb{P})$ be a probability space and $Z$ be a random variable taking values in an interval $[a,b]$ with $0<a<b<\infty$. If $f:[a,b]\rightarrow\mathbb{R}$ is a convex continuous function with $f(a)>f(b)$, then $$\begin{align} \mathbb{E}[Z]f\big(\mathbb{E}[Z]\big)&\leq \mathbb{E}[Z]\mathbb{E}[f(Z)] \\ &\leq \max\Big(\frac{(f(a)b-f(b)a)^2}{4(f(a)-f(b))(b-a)},af(a),bf(b)\Big)\tag{0}\label{kan} \end{align}$$ Equality in the right-side is attained when $Z$ takes at most two values in $\{a,b\}$.

A particular case of this inequality is when $f(x)=\frac1x$, which gives $$\begin{align} 1\leq \mathbb{E}[Z]\mathbb{E}[Z^{-1}]\leq\frac{(a+b)^2}{4ab} \tag{1}\label{one}\end{align}$$
Another particular case of interest is for random variables $X,Y$ such that $0\leq a\leq X\leq b$ and $0\leq A\leq Y\leq B$. Then $$\begin{align} \frac{\mathbb{E}[X^2]\mathbb{E}[Y^2]}{\mathbb{E}[XY]}\leq\frac{(aA+bB)^2}{4ABab}\tag{2}\label{two} \end{align}$$ To see this, set $U=X/\sqrt{\mathbb{E}[XY]}$ and $V=Y/\sqrt{\mathbb{E}[XY]}$ so that $\mathbb{E}[UV]=1$. Condider the equivalent probability measure $d\mathbb{Q}=UV\,d\mathbb{P}$. Then $$\begin{align} \frac{\mathbb{E}[X^2]\mathbb{E}[Y^2]}{\mathbb{E}[XY]}&=\mathbb{E}[U^2]\mathbb{E}[V^2]\\ &=\mathbb{E}_\mathbb{Q}\big[U/V\big]\mathbb{E}_\mathbb{Q}\big[V/U\big]\\ &\leq\frac{\big(\frac{a}{B}+\frac{b}{A}\big)^2}{4\frac{a}{B}\frac{b}{A}}=\frac{(aA+bB)^2}{4ABab} \end{align}$$

Proof: The left-hand-side of \eqref{kan} is just Jensen's inequality.

For the right-hand side inequality, consider the 2d-random vector $W:=(Z,f(Z))$. Since $f$ is convex, we have that $\mathbb{E}[W]=(\mathbb{E}[Z],\mathbb{E}[f(Z)])$ is a vector located in the convex hull $H$ generated by the graph of $f(x)$, for $a\leq x\leq b$, which in this case, is the region that lies between the graph of $f$ and the line $g(x)=\frac{f(b)-f(a)}{b-a}(x-a)+f(a)$ for $a\leq y\leq b$., that is $$H=\left\{(x,y):a\leq x\leq b: f(x)\leq y\leq \frac{f(b)-f(a)}{b-a}(x-a)+f(a)=g(x)\right\}$$ To find an upper bound for $\mathbb{E}[Z]\mathbb{E}[f(Z)]$, we find the maximum of $\phi(x,y)=xy$ for $(x,y)\in H$. Notice that for any $(x,y)\in H$, $xy\leq xg(x)$. Thus, it suffices to find a maximum for $\phi$ along the line $(x,g(x))$, $a\leq x\leq y$. A simple calculation shows that $$0=\frac{d}{dx}(xg(x))=g(x)+xg'(x)=\frac{\big(f(b)-f(a)\big)(2x-a)+f(a)(b-a)}{b-a}$$ This gives critical point $$\begin{align} x^*&=\frac12a+\frac{f(a)(b-a)}{2(f(a)-f(b))}=a+\frac{-a(f(a)-f(b)) +f(a)(b-a)}{2(b-a)(f(a)-f(b))}(b-a)\\ &=a+\lambda (b-a)=(1-\lambda)a+\lambda b \end{align}$$ where $\lambda=\frac{-a(f(a)-f(b)) +f(a)(b-a)}{2(b-a)(f(a)-f(b))}$. Since $\frac{d^2}{dx^2}(xg(x))=2\frac{f(b)-f(a)}{b-a}<0$ and there is no other critical point, $x^*$ is the unique global maximum point of $\phi(x,g(x))$ along the whole line $y=g(x)$, with value $$\phi(x^*,g(x^*))=\frac{(f(a)b-f(b)a)^2}{2(f(a)-f(b))(b-a)}$$ For equality in the right-hand side of \eqref{kan}, notice that $0\leq\lambda\leq1$ iff $a\leq x^*\leq b$. In such case, if $Z$ take value $a$ with probability $1-\lambda$ and $Z=b$ with probability $\lambda$, the upper bound $\phi(x^*,g(x^*))$ is realized. If $\lambda<0$, then with $Z\equiv b$, the upper bound $bf(b)$ is realized, while if $\lambda>1$ and $Z\equiv a$, the upper bound $f(a)a$ is realized.

Applications:

To obtain Kantorovich's inequality for positive definite matrices, assume $A$ is an $n\times n$ positive definite matrix with eigenvalues $0<a=\lambda_1 \leq\ldots\leq \lambda_n=b$. Without loss of generality, we may assume that $A$ is a diagonal matrix $\operatorname{diag}(\lambda_1,\ldots,\lambda_n)$. Suppose $x\in\mathbb{C}^n$ is a unit vector, $\|x\|^2_2=\sum^n_{j=1}|x_j|^2=1$. Consider the probability space $\Omega=\{1,\ldots,n\}$, $\mathscr{F}$ the collection of all subsets of $\Omega$, $\mathbb{P}[\{j\}]=|x_j|^2$, and $Z(j)=\lambda_j$ for $1\leq j\leq n$. Kantorovich's inequality $$ (x^\intercal Ax)(x^\intercal A^{-1}x)\leq \frac{(\lambda_1+\lambda_n)^2}{2\lambda_1\lambda_n}$$ follows immediately from \eqref{one}. Equality happens when either $Z$ is degenerate with value $a$ or $b$, or if $Z$ takes values $a$ or $b$ with probability $\lambda=\frac12$.
If $A$ and $B$ are $n\times n$ positive definite matrices that commute, then they are simultaneously diagonalizable, i.e. there is an orthonormal matrix $P$ such that $PAP^*=\operatorname{diag}(\lambda_1,\ldots,\lambda_n)$ and $PBP^*=\operatorname{diag}(\mu_1,\ldots,\mu_n)$, where $0<\lambda_1\leq\ldots\leq\lambda_n$ and $0<\mu_1\leq\ldots\leq\mu_n$. In a similar manner as in (1.), for all $x\in\mathbb{R}^n$, the Greub-Rheinboldt inequality $$ \frac{(x^\intercal Ax)(x^\intercal Bx)}{x^\intercal AB x} \leq \frac{(\lambda_1\mu_1+\lambda_n\mu_n)^2}{4\lambda_1\lambda_n\mu_1\mu_n} $$ follows from \eqref{two}.

Final Comments:

The proof presented here is a generalization of another probabilistic proof of Kantorovich's inequality presented here
The probabilistic results extend Kantorovich's inequality to any type of random variables $Z$ (discrete, continuous, combination of both) that are bounded above and away from zero, i.e. $0<a\leq Z\leq b$.

Understanding Kantorovich's inequality

2 Answers2

Linked