Here is a statement of the famous Kantorovich inequality.
Thoerem (Kantorovich). Let $A$ be a $n\times n$ symmetric and positive matrix. Furthermore, assume that its eigenvalues are $0 < \lambda_1 \leq \dots \leq \lambda_n$. Then, the following inequality holds for all $\mathbf{x}\in\mathbb{R}^n$ \begin{equation} \frac{(\mathbf{x}^{\top}A\mathbf{x})(\mathbf{x}^{\top}A^{-1}\mathbf{x})}{(\mathbf{x}^{\top}\mathbf{x})^2} \leq \frac{1}{4}\frac{(\lambda_1+\lambda_n)^2}{\lambda_1\lambda_n} = \frac{1}{4}\Bigg(\sqrt{\frac{\lambda_1}{\lambda_n}}+\sqrt{\frac{\lambda_n}{\lambda_1}}\Bigg)^2. \end{equation}
There are a variety of proofs for this inequality. My aim for asking this question is three fold. First, to gather a list of all nice proofs about this inequality. Second, to see if a proof with constrained optimization techniques is possible. Third, to know how Kantorovich thought about the problem. Here are the main questions.
Questions
- What are different approaches (excluding those mentioned below) for proving Kantorovich inequality?
- Can it be proved via constrained optimization techniques, continuing what I described below?
- How did Kantorovich prove it himself?
Different Approaches
- This is an elegant and beautiful proof based on probability techniques.
- This is another proof by simple and clever algebra.
A Constrained Optimization Way
However, I am wondering if it can be proved via the most naive idea that comes to mind. Indeed, by maximizing the left hand side of the inequality! For this purpose, we can rewrite the left hand side by introducing $\mathbf{y} = \frac{\mathbf{x}}{\lVert\mathbf{x}\rVert}$ as below
\begin{equation} f(\mathbf{y}) = (\mathbf{y}^{\top}A\mathbf{y})(\mathbf{y}^{\top}A^{-1}\mathbf{y}). \end{equation}
Now, it seems natural to maximize $\phi(\mathbf{y})$ subject to the constraint $\mathbf{y}^{\top}\mathbf{y} = 1$. To make the problem even simpler, one can use the spectral decomposition $A=Q^{\top}\Lambda Q$ to write $\phi(\mathbf{y})$ as
\begin{equation} g(\mathbf{z}) = \big(\sum_{i=1}^{n} \lambda_i z_i^2\big) \big(\sum_{i=1}^{n} \frac{1}{\lambda_i} z_i^2\big), \end{equation}
where $\mathbf{z} = Q \mathbf{y}$. Finally, let $\xi_i = z_i^2$ to arrive at
\begin{equation} \phi(\boldsymbol{\xi}) = \big(\sum_{i=1}^{n} \lambda_i \xi_i\big) \big(\sum_{i=1}^{n} \frac{1}{\lambda_i} \xi_i\big) = \sum_{i=1}^{n}\sum_{j=1}^{n} \frac{\lambda_i}{\lambda_j}\xi_i\xi_j = \boldsymbol{\xi}B\boldsymbol{\xi}, \end{equation}
with the constraints
\begin{equation} \sum_{i=1}^{n}\xi_i = 1, \qquad \xi_i \ge 0. \end{equation}
As we are usually fond of symmetric matrices we can replace $B$ by $\frac{1}{2}(B + B^{\top})$ because we know that $B = \frac{1}{2} (B + B^{\top}) + \frac{1}{2}(B - B^{\top})$ and $\frac{1}{2}\boldsymbol{\xi}(B - B^{\top})\boldsymbol{\xi} = 0.$ Consequently, $f$ can be rewritten as
\begin{equation} \phi(\boldsymbol{\xi}) = \frac{1}{2} \sum_{i=1}^{n}\sum_{j=1}^{n} \Bigg(\frac{\lambda_i}{\lambda_j} + \frac{\lambda_j}{\lambda_i}\Bigg)\xi_i\xi_j = \frac{1}{2}\boldsymbol{\xi}H\boldsymbol{\xi}. \end{equation}
Can we find the maximizer of $\phi(\boldsymbol{\xi})$ subject to the aforementioned constraints via constrained optimization techniques?