What is the conditional min-entropy for diagonal ("classical") matrices?

Question

The conditional min-entropy, discussed e.g. in these notes by Watrous, as well as in this other post, can be defined as $$\mathsf{H}_{\rm min }(\mathsf{X} \mid \mathsf{Y})_{\rho}\equiv -\inf _{\sigma \in \mathsf{D}(\mathcal Y)} \mathsf{D}_{\rm max }\left(\rho \| \mathbb{1}_{\cal X} \otimes \sigma\right), \\ \mathsf D_{\max }(\rho \| Q)\equiv \inf \left\{\lambda \in \mathbb{R}: \rho \leq 2^{\lambda} Q\right\}. $$ Among other things, it can be given a rather direct operational interpretation, at least for classical-quantum states $\rho=\sum_a p_a |a\rangle\!\langle a|\otimes\xi_a$, as $-\log p_{\rm opt}$, with $p_{\rm opt}$ the optimal guessing probability of discriminating between the elements of the ensemble $a\mapsto (p_a,\xi_a)$.

What do these quantities look like for diagonal matrices? For the relative min-entropy I would get $$\mathsf D_{\rm max}(P\|Q)=\max_i \log\frac{p_i}{q_i},$$ with $p_i\equiv P_{ii}$ and $q_i\equiv Q_{ii}$. I'm however less sure about how to compute $\mathsf H_{\rm min}(\mathsf X|\mathsf Y)_\rho$. The problem being that the minimisation is defined over all possible states, not just diagonal ones. To get a quantity which can be seamlessly applied also to classical distributions, I would guess that the $\inf$ should be saturated by diagonal states $\sigma$. Even assuming this to be the case (which would need to be shown anyway), I'd get $$\mathsf H_{\rm min}(\mathsf X|\mathsf Y)_P = -\inf_{\vec q}\log \max_{a,b} \frac{p_{a,b}}{q_b},$$ where $P$ is some bipartite probability distribution, and the $\inf$ is taken over all probability distributions $\vec q$ on the second system.

Assuming these expressions are correct in the first place, is there a simpler approach leading to nicer expressions? Or let's say, expressions that would seem more natural in a purely classical context.

Rammus · Accepted Answer · 2022-07-07T08:01:47.640

Long story short: taking $\sigma_B = \rho_B$ is equivalent to taking the worst case min-entropy $$ \hat{H}_{\min}(A|B) = - \log \max_{a,b} P(A=a|B=b)\,, $$ and optimizing over $\sigma_B$ is equivalent to taking the averaged min-entropy (standard) $$ H_{\min}(A|B) = - \log \sum_b P(B=b) \max_a P(A=a|B=b)\,. $$

Sufficient to optimize over classical $\sigma_B$

Firstly, let's think about the optimization over $\sigma$ if $\rho_{AB}$ is a diagonal state. It turns out that indeed, it is sufficient to consider $\sigma_B$ to also be diagonal. To see this note that we can write \begin{equation} \begin{aligned} 2^{- H_{\min}(A|B)} = \min& \quad \mathrm{Tr}[\sigma_B] \\ \mathrm{s.t.}& \quad I_A \otimes\sigma_B \geq \rho_{AB} \\ & \quad \sigma_B \geq 0 \end{aligned} \end{equation} which is an SDP. Now if $\rho_{AB}$ is diagonal in the computational basis $\{|i\rangle \otimes | j\rangle\}$ consider the pinching channel on the $B$ system $\mathcal{P}(X) = \sum_j |j\rangle\langle j | X |j \rangle \langle j |$ which takes only the diagonal part of the matrix (in the computational basis for $B$). Now let $\sigma_0$ be any feasible point of the above SDP, if we define $\sigma_1 = \mathcal{P}(\sigma_0)$ then we get a new feasible point of the SDP with the same objective function because $\mathcal{P}$ is a channel and therefore preserves positive-semidefiniteness. Moreover this new feasible point is a diagonal operator and so it suffices to optimize only over diagonal (classical) $\sigma_B$.

Case 1: $\sigma_B = \rho_B$

If we forego the optimization over $\sigma_B$ and set it to $\rho_B$, we see from your calculations that $$ \begin{aligned} \hat{H}_{\min}(A|B) &= - \log \max_{a,b} \frac{p(a,b)}{p(b)} \\ &= - \log \max_{a,b} P(A=a|B=b) \end{aligned} $$

Case 2: Optimizing over $\sigma_B$ If we take the dual of the above SDP (which is actually a linear program now that everything is diagonal) we get $$ \begin{aligned} 2^{-H_{\min}(A|B)} = \max& \quad \sum_{a,b} \mathrm{Tr}[Y_{AB} \rho_{AB}] \\ \mathrm{s.t.}& \quad 0 \leq Y_{B} \leq I_{B} \\ & \quad Y_{AB} \geq 0 \end{aligned} $$ Note that I've written it in SDP form to reflect how we usually see it with quantum systems but here it is actually an LP and $Y_{AB}$ is a diagonal matrix (or just a vector). Considering this we can rewrite the above optimization as $$ \begin{aligned} 2^{-H_{\min}(A|B)} = \max& \quad \sum_{a,b} Y(a,b) P(a,b) \\ \mathrm{s.t.}& \quad 0 \leq \sum_a Y(a,b) \leq 1 \qquad \text{for all } b \\ & \quad Y(a,b) \geq 0 \qquad \text{for all }a,b \end{aligned} $$ Now take the following feasible point $$ Y(a,b) = \begin{cases} 1 \qquad \text{if }a = \mathrm{argmax}_{a'} P(A=a',B=b) \\ 0 \qquad \text{otherwise} \end{cases} $$ in other words, set $Y(a,b) = 1$ if $a$ is the output for which $P(a,b)$ is maximal otherwise set it to 0 (if multiple outputs are maximal then just pick one of them). You can check that this choice is a valid feasible point of the maximization and it gives an objective value $$ \begin{aligned} \sum_{a,b} Y(a,b) P(a,b) &= \sum_b \max_a P(A=a,B=b) \\ &= \sum_b P(B=b) \max_a \frac{P(A=a,B=b)}{P(B=b)} \\ &= \sum_b P(B=b) \max_a P(A=a|B=b) \end{aligned} $$

To see that this is actually the optimal feasible point consider again the primal problem \begin{equation} \begin{aligned} 2^{- H_{\min}(A|B)} = \min& \quad \mathrm{Tr}[\sigma_B] \\ \mathrm{s.t.}& \quad I_A \otimes\sigma_B \geq \rho_{AB} \\ & \quad \sigma_B \geq 0 \end{aligned} \end{equation} and take $\sigma_B = \sum_b (\max_a P(a,b)) |b \rangle \langle b|$. This is a feasible point and yields the same objective value. Hence by strong duality we must have the true optima is $$ \sum_b P(B=b) \max_a P(A=a|B=b) $$ which is exactly the quantity inside the logarithm of the averaged (standard) $H_{\min}(A|B)$.

glS · Answer 2 · 2024-05-15T16:59:34.863

Classical definition of $\mathsf D_{\rm max}(P\|Q)$

$\newcommand{\H}{\mathsf{H}}\newcommand{\Hmin}{\H_{\rm min}}\newcommand{\D}{\mathsf{D}}\newcommand{\Dmax}{\D_{\rm max}}$Consider the max-relative entropy of two probability distributions $P,Q$ as defined by $$\Dmax(P\|Q) = \max_a \log\left(\frac{P_a}{Q_a}\right).$$ I start with this definition because it's the most direct analog to the standard relative entropy, which reads $\D(P\|Q)=\sum_a P_a \log(P_a/Q_a)$. This is equivalent to the other definition in terms of a linear program: $$\Dmax(P\|Q) = \min\{\eta\in\mathbb{R}: \,\, \log(P_a/Q_a)\le \eta\,\, \forall a\} \\ = \min\{\log(\eta): \,\, P\le \eta\, Q\} = \min\{\lambda\in\mathbb{R}: \,\, P\le 2^{\lambda}\, Q\}.$$

Standard relation between $\D(P\|Q)$ and $\H(X|Y)_P$

Consider now the conditional entropy. For the standard one, $\H(X|Y)_P=\H(XY)_P-\H(Y)_{P_Y}$, we can write it in terms of relative entropy as $$\H(X|Y)_P = - \D(P \| I\otimes P_Y) = \H(XY)_P + \sum_{xy} P_{XY}(x,y)\log (P_Y(y)),$$ and it is not hard to observe that, in general, $\sum_a p_a \log q_a\le \sum_a p_a \log p_a$ for any $Q\ge0$ with $\sum_a q_a=1$, which means that we can also write the conditional entropy as $$\H(X|Y)_P=\max_Q [-\D(P \| I\otimes Q)] = - \min_Q \D(P \| I\otimes Q),\tag 4$$ with the maximum over probability distributions $Q$ on the second space (that is, over all vectors with $Q_a\ge0$ and $\sum_a Q_a=1$).

Classical definition of $\Hmin$ from $\Dmax$

Now, one could try to define the conditional min-entropy using (4), replacing $\D$ with $\D_{\rm max}$. However, doing so, the fact that the max over $Q$ is achieved by $Q=P_Y$ stops being true. We thus instead define $$\Hmin(X|Y)_P = -\min_Q \D_{\rm max}(P\|I\otimes Q).$$

Solve the minimisation in the definition of $\Hmin$

Using the characterisation given above for $\Dmax$ in terms of a min, we get $$\Hmin(X|Y)_P = -\min\big\{\log\eta: \,\, \eta\ge0, \,\,P \le \eta(I\otimes Q), \,\, Q\ge0, \,\, \sum_y Q_y=1\big\}.$$ This writing is advantageous because it involves two variables, $\eta\ge0$ and probability distributions $Q$, which can be put together, to only minimise with respect to arbitrary positive vectors $\tilde Q\ge0$ such that $P\le I\otimes \tilde Q$. We can then recover the corresponding $\eta$ because $\sum_y \tilde Q_y=\eta$. In other words, we can write the conditional min entropy as $$\Hmin(X|Y)_P = - \min\left\{ \log\left(\sum_y \tilde Q_y\right): \,\, P_{xy} \le \tilde Q_y \right\}.$$ This writing makes it easy to solve the minimisation problem, with solution $\tilde Q$ such that $\tilde Q_y=\max_x P_{xy}$, and thus we conclude $$\Hmin(X|Y)_P = - \log\left(\sum_y \max_x P_{xy}\right),$$ which equals the optimal probability to discriminate between inputs $x$, if our "measurements" are sampled from the corresponding probability distributions $x\mapsto P(x|y)\equiv P_{xy}/\sum_x P_{xy}$.

Examples

Some examples to illustrate the above calculations in practice

Relative entropy vs max relative entropy

Let $P=(1/2,1/2)$ and $Q=(3/4,1/4)$. The regular relative and conditional entropies are then $$\D(P\|Q)= \frac12 \log(2/3) + \frac12\log2 = \log2 - \frac12\log3.$$ On the other hand, $$\Dmax(P\|Q) \equiv \max_a \log(P_a/Q_a) = \log2.$$ Equivalently, we can get $\Dmax$ observing how $P_a/Q_a\in\{2/3, 2\}$ and thus $\log(P_a/Q_a)\le \log 2$, or how $P\le 2Q$.

Min conditional entropies for maximally correlated bits

Let's now consider a "bipartite" two-bit distribution, $P=(1/2,0,0,1/2)$. This represents a two-bit fully correlated distribution. In other words, we observe either $00$ or $11$ with equal probabilities. Let also $Q=(Q_0,1-Q_0)$, $Q_0\in[0,1]$ be an arbitrary binary distribution. Then $$\D(P\|I\otimes Q) = -\frac12 \log(4Q_0 (1-Q_0)),$$ where I used $I\otimes Q\equiv (Q_0, Q_1, Q_0, Q_1)$. Maximising $-\D(P\|I\otimes Q)$ is then achieved with $Q=(1/2,1/2)$, and thus we have $\H(X|Y)_P = 0$. Which of course is what we should expect from the distribution being maximally correlated.

On the other hand, the relative max entropy reads $$\Dmax(P\|I\otimes Q) = \max_a \log \frac{1}{2Q_a} = \log \frac{1}{2 Q_{\rm min}},$$ where $Q_{\rm min}\equiv \min(Q_0, 1-Q_0)$. To get $\Hmin$ we now want to maximise $-\Dmax$, that is, compute $\max \log(2Q_{\rm min})$ over all possible $Q$. But clearly for any $Q$ we have $Q_{\rm min}\le 1/2$, and thus $$\Hmin(X|Y)_P = \max_Q[-\Dmax(P\|I\otimes Q)] = \log(2/2)=0.$$ We can easily check that this is also what we get from the other explicit formula for $\Hmin$: $$\Hmin(X|Y)_P = -\log\left(\sum_y \max_x P_{xy}\right) = -\log \left(\sum_y 1/2\right) = -\log1=0.$$

Min conditional entropies for not maximally correlated bits

In the above example we didn't get much difference between $\H$ and $\Hmin$, partly because of the highly symmetric choice of $P$. Let's then consider a more interesting example, with $$P=(1/4,1/4,0,1/2).$$ Remember this means that $P_{00}=P_{01}=1/4$ and $P_{11}=1/2$. Intuitively this represents a situation where observing $Y=0$ means $X=0$ for sure, while for $Y=1$ you can have either $X=0$ or $X=1$ (though $X=1$ is more likely).

The standard relative and conditional entropies equal $$\D(P\|I\otimes Q) = \frac14\log\left(\frac{1}{16 Q_0 Q_1}\right) + \frac12 \log\left(\frac{1}{2Q_1}\right), \\ \H(X|Y)_P = -\D(P\|I\otimes P_Y) = -\frac12\log2 + \frac34 \log3.$$ On the other hand, the max relative entropy reads $$\Dmax(P\|I\otimes Q) = \log \max \left(\frac{1}{4Q_0},\frac{1}{4Q_1},\frac{1}{2Q_1}\right) = \log \max \left(\frac{1}{4Q_0},\frac{1}{2Q_1}\right), \\\iff -\Dmax(P\|I\otimes Q) = \log \min (4Q_0, 2(1-Q_0)).$$ A quick plot shows that the function $Q_0\mapsto \min (4Q_0, 2(1-Q_0))$ is "triangle-like" with a maximum for $Q_0=1/3$, and thus $$\Hmin(X|Y)_P = \max_Q[-\Dmax(P\|I\otimes Q)] \\ = -\Dmax(P\| I\otimes (1/3,2/3) ) = \log \frac43.$$ Now how in this case the maximum in the definition of $\Hmin$ is obtained with a $Q=(1/3,2/3)$ that is not a marginal of $P$.