3

Here (p. 15) the author defines conditional divergence as

$$D(P_{Y\mid X}\mid\mid Q_{Y\mid X}\mid P_X):=\mathbb{E}_{x\sim P_X}\left[D(P_{Y\mid X=x}\mid\mid Q_{Y\mid X=x})\right]$$

for two conditional distributions $P_{Y\mid X}$ and $Q_{Y\mid X}$ and with $D(P\mid\mid Q)$ being the usual Kullback-Leibler divergence of $P$ and $Q$.

The "chain rule"

$$D(P_{XY}\mid\mid Q_{XY})=D(P_{Y\mid X}\mid\mid Q_{Y\mid X}\mid P_X)+D(P_X\mid\mid Q_X)$$

with $P_{XY}$ and $Q_{XY}$ being the joint probability distributions is afterwards proved using only the following line:

$$\text{Disintegration: }\mathbb{E}_{(X,Y)}\left[\log\frac{P_{XY}}{Q_{XY}}\right]=\mathbb{E}_{(X,Y)}\left[\log\frac{P_{Y\mid X}}{Q_{Y\mid X}}+\log\frac{P_{X}}{Q_{X}}\right].$$

Question 1: What exactly is this Radon-Nikodym derivative of conditional distributions $\frac{P_{Y\mid X}}{Q_{Y\mid X}}$? The usual definition I know only uses probability distributions. Could someone give (or point me to) a rigorous definition?

Question 2: How is the "disintegration" property $\frac{P_{XY}}{Q_{XY}}=\frac{P_{Y\mid X}}{Q_{Y\mid X}}\cdot\frac{P_{X}}{Q_{X}}$ justified? Could someone give an explanation of this, please?

I do not assume discrete spaces or the existence of any densities, but am especially interested in the case of distributions on general measurable spaces.

User1865345
  • 679
  • 1
  • 5
  • 18
Stefan
  • 438

1 Answers1

3
  1. Answer to question 1: In a general setting, Radon Nikodyn derivatives always refer to a couple of measures. So the question rephrases to: what are $P_{Y|X}$ and $Q_{Y|X}$? If these are measures in some sense, you can define the generalized Radon Nikodyn derivative. What you find is that $P_{Y|X}$ and $Q_{Y|X}$ are Regular Conditional Probabilities (https://en.wikipedia.org/wiki/Regular_conditional_probability). This means that for fixed $x, P_{Y|X=x}$ is a measure, and that $P_{Y|X}(A)$ is a version of $\mathbb{E}[1_{\{ Y \in A \}} | X].$ Now you can obtain $$\frac{dP_{Y|X=x}}{dQ_{Y|X=x}}$$

  2. Answer to question 2: It follows from the so called "Stochastic Fubini Theorem" that: $$P_{(Y,X)} = P_{Y|X} \otimes P_{X}$$ where on the r.h.s. we have written the semidirect product of the two measures, defined by: $$P_{Y|X} \otimes P_{X} (A \times B) = \int_B P_{Y|X = x}(A) dP_X(dx)$$

Can this help you to find the results you need? Note that you still have to deal with some technical facts. For example: what tells you that $$P_{(Y,X)} >> Q_{(Y,X)} \Rightarrow P_{Y|X=x} >> Q_{Y|X=x}?$$

Kore-N
  • 4,275
  • can you maybe elaborate on your answer for 2.)? I do not know how from your answer we can already deduce the formula in the question? I also dont understand why if the joints are absolutely continuous to each other this also holds for the conditionals? – guest1 Nov 28 '23 at 10:17