3

I see the following simplification used frequently in the literature, but I have not been able to verify it.

Let $X$ and $Y$ be absolutely continuous (i.e. they have pdfs) $\mathbb{R}^d$-valued random variables. Assume the joint variable $(X,X+Y)$ is absolutely continuous on $\mathbb{R}^{2d}$. Then $$h(X,X+Y)=h(X,Y).$$

Here $h$ signifies differential entropy, defined by $$h(W)=-\int_{\mathbb{R}^{d_W}}f_W(w)\log(f_W(w))\ dw$$ whenever $W$ is an $\mathbb{R}^{d_W}$-valued random variable with pdf $f_W$.

Note1: $X$ and $Y$ are not assumed to be independent.

Note2: Examples where the lhs is finite but the rhs is not defined would be accepted as a counterexample.

I am also wondering, if the statement can be proved, then is it more generally true that $$h(X,g(X,Y))=h(X,Y)$$ where $g$ is a deterministic function of its arguments?

This question is similar, but seems to concern Shannon entropy (i.e. discrete variables). Shannon Entropy and Differential Entropy have different sets of properties as discussed in these links answer1, answer2, question1,and question2.

cantorhead
  • 1,079

2 Answers2

5

Another proof to that of @HaarD is the following.

Using the chain rule,

$$ \begin{align} h(X, X+Y) &= h(X) + h(X+Y\mid X)\\ &=h(X)+h(Y|X)\\ &=h(X,Y), \end{align} $$ where the first equality is application of the chain rule, the second equality holds because adding a constant to a random variable does not change its entropy, and the third equality is again by application of the chain rule.

This result does not generalize for arbitrary $g(X,Y)$.

Stelios
  • 3,197
  • 2
  • 15
  • 13
  • I'm a little unsure about the reasoning behind the second equality (adding a constant to a random variable does not change its entropy). Would $h(X^2+Y|X)=h(Y|X)$ for the same reason? – cantorhead Feb 05 '18 at 22:33
  • 1
    @cantorhead yes – Stelios Feb 05 '18 at 22:38
  • Then by doing the chain rule in the reverse order we get $$h(X,X^2+Y)=h(X,Y).$$ If this is true then it seems more general than the other answer based on linear change of variables. – cantorhead Feb 05 '18 at 22:42
  • @cantorhead indeed – Stelios Feb 05 '18 at 22:52
  • Can you give more detail in your answer about why the second equality holds? My original motivation for asking the question was actually to understand why $h(X+Y|X)=h(Y|X)$. You've said that its related to translation invariance. Can you show step by step how translation invariance implies $h(X+Y|X)=h(Y|X)$? – cantorhead Feb 06 '18 at 02:31
  • @cantorhead Define a random variable as $Z\triangleq c+X$, where $c$ is a deterministic constant. Now, by elementary probability theory, the pdf of $Z$ equals $p_Z(z)=p_X(z-c)$. Substituting this in the entropy formula of $h(Z)$, it is easy to see that $h(Z)=h(X)$. Also note that the random variable $Q\triangleq X+Y$ conditioned on $X$ has a pdf $p_{Q|X}(q|x)=p_{Y|X}(q-x)$ (with $x$ treated as a constant as we are conditioning on $X$. – Stelios Feb 06 '18 at 07:17
  • Please write this out in more detail as your answer. (You do not need to prove translation invariance since this is a fundamental property.) Please make it clear where the assumptions in the problem statement are used. Thank you. And I hope you understand why I wouldn't want to accept an answer that is spread out across the comments to the answer. – cantorhead Feb 06 '18 at 15:50
  • @cantorhead I am afraid this is as clear/detailed as I can be, hopefully, you got the idea. No worries about the answer accept, I actually think the other answer deserves it more. – Stelios Feb 06 '18 at 23:14
  • Okay, it has been an interesting conversation, though. I've been avoiding conditional differential entropy by transforming to the joint variable form, but here you've shown me something that can be easily done with conditionals whereas I don't see how the linear transformation answer could be extended so easily. Its also interesting that translational invariance ends up doing the same thing, and more, than the linear transformation invariance (up to scale). I'll definitely keep thinking about this. Thank you for your help. – cantorhead Feb 07 '18 at 00:20
4

Use the fact that if $W$ has a pdf and $A$ is a linear transformation then \begin{align*} h(AW)&=h(W)+\log|\mathrm{det}A|. \end{align*}

In this case, let $W=[X \ \ Y]^T$, a vector valued variable in $\mathbb{R}^{2d}$. Let \begin{align} A=\left[\begin{array}{c c} I_d & 0 \\ I_d & I_d \end{array}\right], \end{align} where $I_d$ is the $d\times d$ identity block. Then $AW=[X \ \ X+Y]^T$. Therefore, \begin{align} h(X,X+Y)&=h(AW) \\ &= h(W)+\log|\mathrm{det}A| \\ &= h(X,Y) + \log(1) \\ &= h(X,Y) \end{align}

HaarD
  • 58