In a paper for Transport Inequalities by Nathael Gozlan, the following assertion is made:
Let the relative entropy with respect to $\mu \in P(\mathcal X)$ be defined by $$ H(\nu \mid \mu) = \left\{ \begin{array}{@{}ll@{}} \int_\mathcal X \log(\frac{d\nu}{d\mu})d\nu, & \text{if}\ \nu \ll\mu \\ +\infty, & \text{otherwise} \end{array}\right. , \ \nu \in P(\mathcal X) $$
Now, make $\mu_1$ and $\mu_2$ defined on $\mathcal X_1$ and $\mathcal X_2$, respectively. For a measure $\nu$ on $\mathcal X_1 \times \mathcal X_2$, write the disintegration of $\nu$ (conditional expectation) with respect to the first coordinate as: $$ d\nu(x_1,x_2) = d\nu_1(x_1) d\nu^{x_1}(x_2) $$ Note that the disintegration is pretty much just a formal way of writing the conditional probability formula $P(X=x_1,Y=x_2) = P(X=x_1 \mid Y =x_2)P(Y=x_2)$.
Finally, the author asserts that for the product measure $\mu_1 \otimes\mu_2$ (this is equivalent to $\mu_1 \times \mu_2$, which is a diffrent notation, but with the same meaning), one can prove the following equality: $$ H(\nu \mid \mu_1 \otimes \mu_2) = H(\nu_1 \mid \mu_1) + \int_{\mathcal X_1} H(\nu_2^{x_1}\mid \mu_2)d\nu_1(x_1) $$
My question is how to prove this equality above.
Since the definition of a disintegration is not very common, I will give it here to save people the trouble of hunting it down:
Given two polish (complete and separable) measurable spaces $(\Omega, \mathcal F)$ and $(E, \mathcal A)$. If $P$ is a probability measure in $(\Omega \times E, \mathcal F \otimes \mathcal A)$, and $P_1$ the marginal distribution of the first coordinate. Then, there exists an unique probability kernel $K: \Omega \times \mathcal A \rightarrow [0,1]$, satisfying:
$$ P(A\times B) = \int_A K(\omega, B) P_1(d\omega), \ \forall A \in \mathcal F, \ B \in \mathcal A $$
In this case, we can define $$ P[X_2 \in B \mid X_1 = w] := K(w,B) $$
Where $X_1$ and $X_2$ represent the first and second coordinates respectively.