Apparent contradiction involving differential mutual information

Question

Suppose X is a continuous random variable, and Y = 2X is a deterministic transformation of it. Then the following identities hold, according to Cover & Thomas:

$$ I(X; Y) = h(X) - h(X|Y) \\ I(X; Y) = h(Y) - h(Y|X), $$ where $h$ is differential entropy. Also:

$Y$ is a deterministic function of $X$, and vice versa, so we have $h(Y | X) = 0$ and $h(X | Y) = 0$.
The differential entropy $h(Y) = h(X) + \ln 2$.

Putting it all together, I seem to get $$ I(X; Y) = h(X) \\ I(X; Y) = h(Y) = h(X) + \ln 2. $$

What is the mistake here? And what should the value of $I(X;Y)$ actually be?

see also https://math.stackexchange.com/questions/454078/non-zero-conditional-differential-entropy-between-a-random-variable-and-a-functi/1301569#1301569 — leonbloy, Feb 16 '24 at 12:27

stochasticboy321 · Accepted Answer · 2024-02-16T05:18:49.167

It's not true that if $Y$ is a deterministic function of $X$, then $h(Y|X) = 0.$ In fact the differential entropy of a constant is either $-\infty$ or undefined, depending on your tastes. Formulaically this simply suggests that $I(X;Y) = \infty$. This should make sense operationally: if you can transmit a real number $X$ with perfect fidelity up to scaling, then you can communicate an infinite number of bits with one channel use (e.g., transmit $x = 0.0b_1b_2\cdots,$ then $2x = 0.b_1b_2\cdots$), and so you can communicate at any rate you like.

This is now consisent with the formulae, as such: you have $h(Y) = h(X) + \log(2),$ so $I = h(X) - h(X|Y) = h(X) + \log(2) - h(Y|X)$, but all this is fine since $\infty + c = \infty$ for any real $c$. Further, you have $I(X;Y) =h(X) - (-\infty) = \infty$. Of course, in general, all of this is gobbledygook, and things like mutual information have more subtle definitions for continuous random variables. Cover and Thomas do spend time discussing this. As a rule of thumb, you can just assume that if all of $h(X), h(X|Y), I(X;Y)$ are finite, then $I(X;Y) = h(X) - h(X|Y)$. If not then sometimes this formula might still make some sense, but not always (e.g., what if $h(X) = -\infty$ too?).

The lesson here is that differential entropy is not like discrete entropy, and it is not meaningful to interpret or manipulate it like discrete entropy (e.g., for discrete entropy, $H(2X) = H(X)$). $h(X)$ can be negative, and any point masses pretty much totally break the notion (because density becomes undefined), although some formulae may continue to work sometimes. Cover and Thomas also deal with this fairly explicitly, as far as I recall. In any case, if you're working with continuous random variables, you will soon make friends with the concept of distortion, after which things will keep making sense :)

Apparent contradiction involving differential mutual information

1 Answers1