This is an old posting but I think the following simple (probabilistic) proof presents a more general inequality from which the one in the OP follows.
Theorem: Let $(\Omega,\mathscr{F},\mathbb{P})$ be a probability space and $Z$ be a random variable taking values in an interval $[a,b]$ with $0<a<b<\infty$. If $f:[a,b]\rightarrow\mathbb{R}$ is a convex continuous function with $f(a)>f(b)$, then
$$\begin{align}
\mathbb{E}[Z]f\big(\mathbb{E}[Z]\big)&\leq \mathbb{E}[Z]\mathbb{E}[f(Z)] \\
&\leq \max\Big(\frac{(f(a)b-f(b)a)^2}{4(f(a)-f(b))(b-a)},af(a),bf(b)\Big)\tag{0}\label{kan}
\end{align}$$
Equality in the right-side is attained when $Z$ takes at most two values in $\{a,b\}$.
- A particular case of this inequality is when $f(x)=\frac1x$, which gives
$$\begin{align}
1\leq \mathbb{E}[Z]\mathbb{E}[Z^{-1}]\leq\frac{(a+b)^2}{4ab}
\tag{1}\label{one}\end{align}$$
- Another particular case of interest is for random variables $X,Y$ such that $0\leq a\leq X\leq b$ and $0\leq A\leq Y\leq B$. Then
$$\begin{align}
\frac{\mathbb{E}[X^2]\mathbb{E}[Y^2]}{\mathbb{E}[XY]}\leq\frac{(aA+bB)^2}{4ABab}\tag{2}\label{two}
\end{align}$$
To see this, set $U=X/\sqrt{\mathbb{E}[XY]}$ and $V=Y/\sqrt{\mathbb{E}[XY]}$ so that $\mathbb{E}[UV]=1$. Condider the equivalent probability measure $d\mathbb{Q}=UV\,d\mathbb{P}$. Then
$$\begin{align}
\frac{\mathbb{E}[X^2]\mathbb{E}[Y^2]}{\mathbb{E}[XY]}&=\mathbb{E}[U^2]\mathbb{E}[V^2]\\
&=\mathbb{E}_\mathbb{Q}\big[U/V\big]\mathbb{E}_\mathbb{Q}\big[V/U\big]\\
&\leq\frac{\big(\frac{a}{B}+\frac{b}{A}\big)^2}{4\frac{a}{B}\frac{b}{A}}=\frac{(aA+bB)^2}{4ABab}
\end{align}$$
Proof: The left-hand-side of \eqref{kan} is just Jensen's inequality.
For the right-hand side inequality, consider the 2d-random vector $W:=(Z,f(Z))$. Since $f$ is convex, we have that $\mathbb{E}[W]=(\mathbb{E}[Z],\mathbb{E}[f(Z)])$ is a vector located in the convex hull $H$ generated by the graph of $f(x)$, for $a\leq x\leq b$, which in this case, is the region that lies between the graph of $f$ and the line $g(x)=\frac{f(b)-f(a)}{b-a}(x-a)+f(a)$ for $a\leq y\leq b$., that is $$H=\left\{(x,y):a\leq x\leq b: f(x)\leq y\leq \frac{f(b)-f(a)}{b-a}(x-a)+f(a)=g(x)\right\}$$
To find an upper bound for $\mathbb{E}[Z]\mathbb{E}[f(Z)]$, we find the maximum of $\phi(x,y)=xy$ for $(x,y)\in H$. Notice that for any $(x,y)\in H$, $xy\leq xg(x)$. Thus, it suffices to find a maximum for $\phi$ along the line $(x,g(x))$, $a\leq x\leq y$. A simple calculation shows that
$$0=\frac{d}{dx}(xg(x))=g(x)+xg'(x)=\frac{\big(f(b)-f(a)\big)(2x-a)+f(a)(b-a)}{b-a}$$
This gives critical point
$$\begin{align}
x^*&=\frac12a+\frac{f(a)(b-a)}{2(f(a)-f(b))}=a+\frac{-a(f(a)-f(b)) +f(a)(b-a)}{2(b-a)(f(a)-f(b))}(b-a)\\
&=a+\lambda (b-a)=(1-\lambda)a+\lambda b
\end{align}$$
where $\lambda=\frac{-a(f(a)-f(b)) +f(a)(b-a)}{2(b-a)(f(a)-f(b))}$.
Since $\frac{d^2}{dx^2}(xg(x))=2\frac{f(b)-f(a)}{b-a}<0$ and there is no other critical point, $x^*$ is the unique global maximum point of $\phi(x,g(x))$ along the whole line $y=g(x)$, with value
$$\phi(x^*,g(x^*))=\frac{(f(a)b-f(b)a)^2}{2(f(a)-f(b))(b-a)}$$
For equality in the right-hand side of \eqref{kan}, notice that $0\leq\lambda\leq1$ iff $a\leq x^*\leq b$. In such case, if $Z$ take value $a$ with probability $1-\lambda$ and $Z=b$ with probability $\lambda$, the upper bound $\phi(x^*,g(x^*))$ is realized. If $\lambda<0$, then with $Z\equiv b$, the upper bound $bf(b)$ is realized, while if $\lambda>1$ and $Z\equiv a$, the upper bound $f(a)a$ is realized.
Applications:
To obtain Kantorovich's inequality for positive definite matrices, assume $A$ is an $n\times n$ positive definite matrix with eigenvalues $0<a=\lambda_1 \leq\ldots\leq \lambda_n=b$. Without loss of generality, we may assume that $A$ is a diagonal matrix $\operatorname{diag}(\lambda_1,\ldots,\lambda_n)$. Suppose $x\in\mathbb{C}^n$ is a unit vector, $\|x\|^2_2=\sum^n_{j=1}|x_j|^2=1$. Consider the probability space $\Omega=\{1,\ldots,n\}$, $\mathscr{F}$ the collection of all subsets of $\Omega$, $\mathbb{P}[\{j\}]=|x_j|^2$, and $Z(j)=\lambda_j$ for $1\leq j\leq n$. Kantorovich's inequality
$$ (x^\intercal Ax)(x^\intercal A^{-1}x)\leq \frac{(\lambda_1+\lambda_n)^2}{2\lambda_1\lambda_n}$$
follows immediately from \eqref{one}. Equality happens when either $Z$ is degenerate with value $a$ or $b$, or if $Z$ takes values $a$ or $b$ with probability $\lambda=\frac12$.
If $A$ and $B$ are $n\times n$ positive definite matrices that commute, then they are simultaneously diagonalizable, i.e. there is an orthonormal matrix $P$ such that $PAP^*=\operatorname{diag}(\lambda_1,\ldots,\lambda_n)$ and $PBP^*=\operatorname{diag}(\mu_1,\ldots,\mu_n)$, where $0<\lambda_1\leq\ldots\leq\lambda_n$ and $0<\mu_1\leq\ldots\leq\mu_n$. In a similar manner as in (1.), for all $x\in\mathbb{R}^n$, the Greub-Rheinboldt inequality
$$
\frac{(x^\intercal Ax)(x^\intercal Bx)}{x^\intercal AB x} \leq \frac{(\lambda_1\mu_1+\lambda_n\mu_n)^2}{4\lambda_1\lambda_n\mu_1\mu_n}
$$
follows from \eqref{two}.
Final Comments:
The proof presented here is a generalization of another probabilistic proof of Kantorovich's inequality presented here
The probabilistic results extend Kantorovich's inequality to any type of random variables $Z$ (discrete, continuous, combination of both) that are bounded above and away from zero, i.e. $0<a\leq Z\leq b$.