2

$\DeclareMathOperator{\tr}{tr}$ Let $H$ be a self-adjoint matrix and define the free energy as $$ F(\Gamma)=\tr(H\Gamma+\Gamma \log \Gamma +(1-\Gamma) \log(1-\Gamma)) $$ where $\Gamma$ satisfies $0\le \Gamma \le 1$. I have seen physics papers suggest that $F(\Gamma)$ is minimized if we take $$ \Gamma =\frac{1}{e^H +1} $$ but I can't think of a rigorous proof.

EDIT. The "usual" non-rigrous method of proof would be to use variational method, i.e., \begin{align} \delta F &=\tr\left(H\delta\Gamma+\log\left(\frac{\Gamma}{1-\Gamma}\right)\delta\Gamma \right)\\ 0&=H+\log\left(\frac{\Gamma}{1-\Gamma}\right)\\ \Gamma &= \frac{1}{e^H+1} \end{align} How would one make this argument rigorous?

EDIT 2. I just realized that $F(\Gamma)$ is convex, so I think that the above calculation is almost rigorous, except that $\delta\Gamma$ may not commute with $\Gamma$, as pointed out by @Sangchul Lee.

EDIT 3. Thank you @Sangchul Lee for the proof. I think I also found another proof, that may be a little more straightforward.

Let $\Gamma_0=1/(e^H+1)$ and let $0\le \Gamma_1 \le 1$. Let $\Gamma(t)=(1-t)\Gamma_0 +t\Gamma_1 =\Gamma_0 +t\Delta$ where $\Delta = \Gamma_1-\Gamma_0$. Let $f(x)=x \log x +(1-x) \log (1-x)$. Hence, $$ F(\Gamma)-F(\Gamma_0) = \tr(tH\Delta)+\tr (f(\Gamma)-f(\Gamma_0)) $$ Since $f$ is convex, we can apply Klein's inequality and see that $$ F(\Gamma)-F(\Gamma_0) \ge \tr(t\Delta (H+\log\Gamma_0 -\log (1-\Gamma_0))=0 $$ Also since $f$ is strictly convex, we see that $\Gamma_0$ is the unique global min.

By the way, not sure why this question was voted to close. Please vote to reopen if you think otherwise.

EDIT 4. After further looking into the proof of Klein's inequality, there is a subtle "problem" of taking the derivative of a trace function (which is not fully explained in the wikipedia article). @Sangchul Lee deals with this explicitly for this particular case, but I would like to think that it should be able to be done more generally. Hence, I posted another question here.

Andrew Yuan
  • 3,092
  • 1
    By $0 \le \Gamma \le 1$, do you mean entrywise or that all the eigenvalues are between $0$ and $1$? – JimmyK4542 Nov 20 '20 at 09:16
  • @JimmyK4542 I mean eigenvalues – Andrew Yuan Nov 20 '20 at 19:37
  • 1
    So you calculated the gradient of $F$ and then solved for the value of $\Gamma$ which makes the gradient equal to zero. Why is that not rigorous enough? – greg Nov 20 '20 at 22:13
  • 1
    This result for $\delta F$ seems correct, but how we derive it is rather non-trivial if we allow $\delta\Gamma$ not to commute with $\Gamma$. – Sangchul Lee Nov 20 '20 at 22:21
  • @SangchulLee The trace operation fixes the commutivity issue, so the derivation is as trivial as it appears. – greg Nov 21 '20 at 00:51
  • @greg, Noncommutativity affects the computation in a more subtle way, but I agree that the trace partially takes cares of this issue. – Sangchul Lee Nov 21 '20 at 01:23
  • @SangchulLee I don't understand, since ${\rm Tr}(\Gamma,d\Gamma) = {\rm Tr}(d\Gamma,\Gamma)$, via the cyclic property of the trace, where exactly is the issue? – greg Nov 21 '20 at 19:11
  • @greg, As you may see from my answer, the real issue stems from the differentiation part. – Sangchul Lee Nov 21 '20 at 22:34
  • @AndrewYuan, Now I see that my solution is essentially reproducing Klein's inequality. Glad to know that a whole area of trace inequalities exists. – Sangchul Lee Nov 21 '20 at 23:13

1 Answers1

2

In this answer, we will write

$$ D^+_{B}F(A) := \lim_{\epsilon \to 0^+} \frac{F(A+\epsilon B) - F(A)}{\epsilon} $$

whenever the limit exists. Then the following lemma will be useful:

Lemma. We have $$ D^+_{B}\exp(A) = \int_{0}^{1} e^{sA}Be^{(1-s)A} \, \mathrm{d}s. $$

Proof. We have

$$ D^+_{B}\exp(A) = \sum_{n=1}^{\infty} \frac{1}{n!} D^+_{B}(A^n) = \sum_{n=1}^{\infty} \frac{1}{n!} \sum_{k=0}^{n-1} A^k B A^{n-1-k} = \sum_{k,l\geq 0} \frac{A^k B A^l}{(k+l+1)!}. $$

Then the claim follows from the beta integral $\int_{0}^{1} u^k(1-u)^l \, \mathrm{d}s = \frac{k!l!}{(k+l+1)!} $. $\square$

Now let $\Gamma_0$ and $\Gamma_1$ be self-adjoint matrices such that $0 \leq \Gamma_k \leq 1$ for $k = 0, 1$. Interpolate $\Gamma_0$ and $\Gamma_1$ by letting

$$ \Gamma_t = (1-t)\Gamma_0 + t\Gamma_1, \quad 0 \leq t \leq 1. $$

We also write $\Delta = \frac{\mathrm{d}}{\mathrm{d}t} \Gamma_t = \Gamma_1 - \Gamma_0$ since this will appear frequently. Then by the Frullani's integral and the functional calculus,

\begin{align*} \frac{\mathrm{d}}{\mathrm{d}t} \log \Gamma_{t} &= \frac{\mathrm{d}}{\mathrm{d}t} \int_{0}^{\infty} \frac{e^{-x} - e^{-x\Gamma_t}}{x} \, \mathrm{d}x \\ &= \int_{0}^{\infty} (D^+_{\Delta}\exp)(-x\Gamma_t) \, \mathrm{d}x \\ &= \int_{0}^{\infty} \int_{0}^{1} e^{-sx\Gamma_t}\Delta e^{-(1-s)x\Gamma_t} \, \mathrm{d}s \mathrm{d}x. \end{align*}

From this, we obtain

\begin{align*} \operatorname{Tr}\left( \Gamma_t \frac{\mathrm{d}}{\mathrm{d}t} \log\Gamma_t \right) &= \operatorname{Tr}\left( \int_{0}^{\infty} \int_{0}^{1} \Gamma_t e^{-sx\Gamma_t}\Delta e^{-(1-s)x\Gamma_t} \, \mathrm{d}s \mathrm{d}x \right) \\ &= \int_{0}^{\infty} \int_{0}^{1} \operatorname{Tr}\left( \Gamma_t e^{-sx\Gamma_t}\Delta e^{-(1-s)x\Gamma_t} \right) \, \mathrm{d}s \mathrm{d}x \\ &= \int_{0}^{\infty} \int_{0}^{1} \operatorname{Tr}\left( \Gamma_t e^{-x\Gamma_t}\Delta \right) \, \mathrm{d}s \mathrm{d}x \\ &= \operatorname{Tr}(\Delta), \end{align*}

and similarly

$$ \operatorname{Tr}\left( (1-\Gamma_t) \frac{\mathrm{d}}{\mathrm{d}t} \log(1-\Gamma_t) \right) = -\operatorname{Tr}(\Delta). $$

So we obtain

$$ \frac{\mathrm{d}}{\mathrm{d}t} F(\Gamma_t) = \operatorname{Tr}\left( H\Delta + \Delta\log(\Gamma_t) - \Delta\log(1-\Gamma_t) \right) \tag{1} $$

Differentiating both sides with respect to $t$ again,

\begin{align*} \frac{\mathrm{d}^2}{\mathrm{d}t^2} F(\Gamma_t) &= \operatorname{Tr}\left(\Delta \frac{\mathrm{d}}{\mathrm{d}t}\log(\Gamma_t) - \Delta \frac{\mathrm{d}}{\mathrm{d}t}\log(1-\Gamma_t) \right) \\ &= \operatorname{Tr}\left( \int_{0}^{\infty} \int_{0}^{1} \Delta e^{-sx\Gamma_t}\Delta e^{-(1-s)x\Gamma_t} \, \mathrm{d}s \mathrm{d}x \right) \\ &\qquad + \operatorname{Tr}\left( \int_{0}^{\infty} \int_{0}^{1} \Delta e^{-sx(1-\Gamma_t)}\Delta e^{-(1-s)x(1-\Gamma_t)} \, \mathrm{d}s \mathrm{d}x \right) \\ &= \int_{0}^{\infty} \int_{0}^{1} \left\| e^{-\frac{1}{2}sx\Gamma_t}\Delta e^{-\frac{1}{2}(1-s)x\Gamma_t} \right\|^2 \, \mathrm{d}s \mathrm{d}x \\ &\qquad + \int_{0}^{\infty} \int_{0}^{1} \left\| e^{-\frac{1}{2}sx(1-\Gamma_t)}\Delta e^{-\frac{1}{2}(1-s)x(1-\Gamma_t)} \right\|^2 \, \mathrm{d}s \mathrm{d}x \tag{2} \end{align*}

where $\| X \|^2 = \operatorname{Tr}(X^* X)$ is always a non-negative real number.

Now we are ready to prove the claim.

  • From $\text{(2)}$, we know that $\frac{\mathrm{d}^2}{\mathrm{d}t^2} F(\Gamma_t) \geq 0$, which in turn implies that $F$ is convex.

  • $\text{(1)}$ can be used to show that $\Gamma_{\text{m}} = \frac{1}{e^H + 1}$ is a local extremum of $F$. Then by the convexity, this is a local minimum of $F$.

Therefore $\Gamma_{\text{m}}$ minimizes $F$.

Sangchul Lee
  • 181,930