The data processing inequality states that if you have a Markov chain of random variable $X \rightarrow Y \rightarrow Z$, then $I(X;Y) \geq I(X;Z)$.
This all makes sense in the discrete case, but in the continuous case, which seems to be where it is actually used (in the case of neural networks https://arxiv.org/abs/1703.00810), there is a counter-example:
If I pick $X=unif(0,0.5)$, and $Y=X$, and $Z=c$ where $c$ is some constant.
then $I(X;Y)=I(X;X)=H(X)=-\log(2)$ and $I(X;Z)=0$ since $X$ and $Z$ are certainly independent.
but $-\log(2) \ngeq 0$. So the data processing inequality is wrong?
Is there any way to resolve this issue?