8

Mutual Information satisfies the chain rule: $$I(X,Y;Z) = I(X;Z) + I(Y;Z|X).$$ The chain rule is useful and the proof is simply linearity of expectations.

Sometimes we want something stronger than mutual information. Mutual information is the expectation of the pointwise mutual information, i.e., $I(X;Y) = \mathbb{E}\left[\log\left(\frac{P_{XY}(X,Y)}{P_X(X)P_Y(Y)}\right)\right]$. We might want high probability bounds of the form $\mathbb{P}\left[\log\left(\frac{P_{XY}(X,Y)}{P_X(X)P_Y(Y)}\right)\le\varepsilon\right]\ge1-\delta$. This leads us to notions of mutual information based on Rényi divergences. In particular, we consider Sibson's mutual information definition: $$I_\alpha(X;Y) = \min_{Q_Y} D_\alpha\left(P_{XY}\middle\| P_X \times Q_Y\right),$$ where $D_\alpha(P\|Q) = \frac{1}{\alpha-1} \log \sum_x P(x)^\alpha Q(x)^{1-\alpha}$ is the Rényi divergence of order $\alpha$. This definition coincides with the usual definition of mutual information as $\alpha\to1$.

Sibson's mutual information $I_\alpha(X;Y)$ is the Rényi divergence from the joint distribution $(X,Y)$ and a product distribution that matches the marginal distribution of $X$. But the marginal distribution of $Y$ is not necessarily matched - thus Sibson's mutual information is not symmetric. Rather than matching $Y$, we choose this marginal to minimize the divergence.

We can analytically minimize the Rényi divergence to obtain a closed-form expression for Sibson's mutual information: $$I_\alpha(X;Y) = \frac{\alpha}{\alpha-1}\log\left( \sum_y \left( \sum_x P_X(x) \cdot P_{Y|X}(y|x)^\alpha \right)^{1/\alpha} \right)$$ $$~~~~~~~~~~~~ = \frac{\alpha}{\alpha-1} \log \mathbb{E}_Y \left[\exp\left(\frac{\alpha-1}{\alpha}D_\alpha(P_{X|Y}\|P_X)\right)\right].$$

My question: Does Sibson's mutual information satisfy some (weak) form of the chain rule? Specifically, I have the following conjecture.

Conjecture: Let $X$ and $Y$ be independent and let $Z$ depend on $X$ and $Y$ arbitrarily. Let $\alpha>1$. Then $$I_\alpha(X,Y;Z) \le \max_y I_\alpha(X;Z|Y=y) + \max_x I_\alpha(Y;Z|X=x).$$

This conjecture is weaker than the chain rule for the usual mutual information in three ways: First, we take the max over $x$ in the second term, rather than the expectation. Second, we take the max over $y$ in the first term, rather than marginalizing it. Third, we assume $X$ and $Y$ are independent. I suspect that we can actually prove something stronger than this conjecture (i.e., without all three weakenings).

If we expand out the definition of Sibson's mutual information, then the conjecture above is equivalent to $$\sum_z \left( \sum_x \sum_y P_X(x) \cdot P_Y(y) \cdot P_{Z|XY}(z|x,y)^\alpha \right)^{1/\alpha}$$ $$\le \left( \max_{y_*} \sum_z \left( \sum_x P_X(x) \cdot P_{Z|XY}(z|x,y_*)^\alpha \right)^{1/\alpha} \right) \cdot \left( \max_{x_*} \sum_z \left( \sum_y P_Y(y) \cdot P_{Z|XY}(z|x_*,y)^\alpha \right)^{1/\alpha}\right)$$

Thomas Steinke
  • 945
  • 5
  • 23
  • Let X and Y be independent fair coin flips, and let Z be a random variable that takes the value 1 if X and Y are both heads, and 0 otherwise. Then it is easy to see that Iα(X,Y;Z) = 0 for any α > 1, since X and Y are independent of Z. However, both maxy Iα(X;Z|Y=y) and maxx Iα(Y;Z|X=x) are strictly positive. Therefore... – titusfx Feb 20 '23 at 19:10
  • 2
    @titusfx $I_{\alpha}(X, Y; Z) = 0$ if and only if $P_{Z|X, Y} \equiv P_{Z}$ (i.e., $Z$ is independent of $(X, Y)$), which does not hold in your construction. – Xiangxiang Xu Feb 20 '23 at 20:03

1 Answers1

3

The conjecture is not true in general. A counterexample can be constructed as follows.

Suppose $X, Y, Z$ take values from $\{1, 2, 3, 4\}$, with $$ \begin{align*} P_{Z|X, Y}(z|x, y) = \begin{cases} 1& \text{if $x = y = z$},\\ 0& \text{if $x = y$, $z \neq x$},\\ 1/4& \text{if $x \neq y$}. \end{cases} \end{align*} $$ Suppose $P_X$ and $P_Y$ are arbitrary distributions with positive probability masses. Let $\alpha = \infty$, then

\begin{align*}\text{LHS} &= \sum_z \left( \sum_x \sum_y P_X(x) \cdot P_Y(y) \cdot P_{Z|XY}(z|x,y)^\alpha \right)^{1/\alpha}\\ &= \sum_z \left(\max_{x, y}P_{Z|XY}(z|x,y)\right)\\ &= 4. \end{align*}

For all $y_* = 1, 2, 3, 4$, we have $$ \sum_z \left( \sum_x P_X(x) \cdot P_{Z|XY}(z|x,y_*)^\alpha \right)^{1/\alpha} = \sum_z \left(\max_x P_{Z|XY}(z|x,y_*)\right) = 1 + 3 \cdot \frac14 < 2. $$ By symmetry, for all $x_* = 1, 2, 3, 4$, we have $$\sum_z \left( \sum_y P_Y(y) \cdot P_{Z|XY}(z|x_*,y)^\alpha \right)^{1/\alpha} < 2. $$ Therefore, $\text{LHS} = 4 > \text{RHS}$.


Remarks:

  1. This construction can also be used as a counterexample for general $X, Y$ dependence, i.e., when we replace $P_{X}P_{Y}$ in LHS with $P_{X, Y}$, and replace the $P_X, P_Y$ in RHS with $P_{X|Y=y_*}$ and $P_{Y|X=x_*}$, respectively.
  2. Neither the zero-probability mass in $P_{Z|X, Y}$ nor the assumption $\alpha = \infty$ is necessary for constructing a counter example. In general, one can consider $X, Y, Z$ taking values from $\{1, 2, \cdots, K\}$ with \begin{align*} P_{Z|X, Y}(z|x, y) = \begin{cases} p& \text{if $x = y = z$},\\ (1-p)/(K-1)& \text{if $x = y$, $z \neq x$},\\ 1/K& \text{if $x \neq y$}. \end{cases} \end{align*}

Then, we can again obtain a counterexample for $pK \geq 4$ with $\alpha$ sufficiently large.

  • 1
    Awesome, thanks for resolving this question! I checked and this construction (with X=Y=uniform) also violates the conjecture for $\alpha=4$, but not for $\alpha=2,3$. Alas, I'm disappointed that this conjecture doesn't hold, as it would make my life easier... – Thomas Steinke Feb 25 '23 at 22:50