6

I would like to ask about the measurability of the conditional expectation of $Z$ given $X=x$ calculated on the conditional probability measure $\mathbb{P}_{X,Z|Y=y}$.

Assumptions

Let $(\Omega, \mathcal{F}, \mathbb{P})$ be a probability space and let $(\mathcal{X}, \mathcal{A})$ and $(\mathcal{Y}, \mathcal{B})$ be measurable spaces such that the existence of a regular conditional distribution is guaranteed (e.g., Euclidean space). Let $X: \Omega \to \mathcal{X}$, $Y: \Omega \to \mathcal{Y}$ and $Z: \Omega \to \mathbb{R}$ are random quantities (i.e., measurable mappings) such that $\mathbb{E}[\lvert Z \rvert] < \infty$.

For each $y \in \mathcal{X}_2$, let $h(x,y) = \mathbb{E}_{(X,Z) \sim \mathbb{P}_{(X,Z)|Y=y}}[Z|X=x]$, which denote the conditional expectation of $Z$ given $X = x$ calculated in the probability space $(\mathcal{X} \times \mathbb{R}, \mathcal{A} \otimes \mathcal{B}(\mathbb{R}), \mathbb{P}_{X,Z|Y=y})$, where $\mathbb{P}_{X,Z|Y=y}$ is a (regular) conditional distribution of $(X,Z)$ given $Y=y$.

My question

Is the following proposition true? And if it is true, could you tell me how to show it?

$(x,y) \mapsto h(x,y)$ is a $(\mathcal{A} \otimes \mathcal{B})$-measurable function.

In Double conditional probability, this proposition is assumed, but according to the proof of Theorem B.75. (on page 633) of Theory of Statistics, by Mark J. Schervish, it is true.

Sugiyama
  • 151
  • I think you have asked this question already with another account.. – Snoop Dec 09 '22 at 08:49
  • @Snoop Sorry, it's the same account, but I didn't understand the system well and re-submitted a modified version. – Sugiyama Dec 09 '22 at 08:57
  • There is a slight misunderstanding here. It makes no sense to talk about the conditional expectation of $Z$ given $X = x$ calculated in the probability space $(\mathcal{X} \times \mathbb{R}, \mathcal{A} \otimes \mathcal{B}(\mathbb{R}), \mathbb{P}_{X,Z|Y=y})$ since neither $X$ nor $Z$ are defined in that space. – Speltzu Mar 05 '25 at 09:08
  • @Speltzu yes, technically $X$ and $Z$ are simply the identity of the first and second element in this space. But using different notation would also be quite confusing – Felix Benning Mar 06 '25 at 07:26
  • Wouldn't it be more logical to consider the space $(\Omega, \mathcal{F}, \mathbb{P}_{\Omega/Y})$ instead? – Speltzu Mar 06 '25 at 09:09

3 Answers3

2

Seems that you need to look at how you construct your conditional probabilities.

The answer is yes, provided you use a construction of the conditional probability with "nets", as defined in Feller, An introduction to probability theory and its applications, Vol. 2 (1971), Section V.10. I will require that both $\mathcal X$ and $\mathcal Y$ have a net (which is true if $\mathcal X$ and $\mathcal Y$ are Polish spaces).

Preliminary: "nets", Radon–Nikodym derivatives, and conditional probability kernels

For a space $\mathcal U$, a "net" (in the sense of footnote 20) is a collection $(U_{i_1, \dots, i_n})_{(i_j)\in \{0,1\}^n, n\geq 0}$ such that

  • $U_\emptyset = \mathcal U$,
  • $(U_{i_1, \dots, i_{n-1}, 0}, U_{i_1, \dots, i_{n-1}, 1})$ is a partition of $U_{i_1, \dots, i_{n-1}}$,
  • $\cap_{n\geq 1} U_{i_1, \dots, i_n}$ is a singleton for every sequence $(i_j)_{j\geq 1}$.

When $\mathcal U$ is equipped with a sigma-algebra $\mathfrak U$, I assume in addition that $\mathfrak U$ is generated by the net.

Given a set $\mathcal U$ with a net $(U_{i_1, \dots, i_n})$ generating the sigma-algebra $\mathfrak U$ and a measurable space $(\mathcal V, \mathfrak V)$ with a distinguished (arbitrary) element $v$, for every measurable set $V\in\mathfrak V$ and every probability measure $\mu$ on $(\mathcal U \times \mathcal V, \mathfrak U \otimes \mathfrak V)$, the Radon-Nikodym density of $\mu(\cdot \times V)$ with respect to $\mu(\cdot \times \mathcal V)$ is the $\mu(\cdot\times\mathcal V)$-a.e. limit (with values in $[0,1]$) of the functions $$ f^V_n : u \in U_{i_1, \dots, i_n} \mapsto \begin{cases} \frac{\mu(U_{i_1, \dots, i_n} \times V)}{\mu(U_{i_1, \dots, i_n} \times \mathcal V)} &\text{ if } \mu(U_{i_1, \dots, i_n} \times \mathcal V)>0 , \\ [v\in V] &\text{ otherwise,} \end{cases} $$ where $[v\in V]=1$ if $v\in V$ and $0$ otherwise. (We can see this by an elegant martingale argument.) Being simple functions, the $f^V_n$ are measurable, hence their a.s. limit $g^V$ is also measurable.

If $\mathcal V$ has a net $(V_{i_1, \dots, i_n})$ generating $\mathfrak V$, then (following Footnote 20) we can construct a Markov kernel $$\kappa : \mathcal U \times \mathfrak V \to [0,1] , (u,V) \mapsto \kappa(u, V) $$ such that $\kappa(u, \cdot)$ is a probability measure on $(\mathcal V, \mathfrak V)$ for every $u\in \mathcal U$ and $\kappa(\cdot, V)$ is measurable for every $V\in \mathfrak V$: to do so, first fix $\kappa(u, V_{i_1, \dots, i_n, 0}) = g^{V_{i_1, \dots, i_n, 0}}(u)$ and $\kappa(u, V_{i_1, \dots, i_n, 1}) = g^{V_{i_1, \dots, i_n}}(u) - g^{V_{i_1, \dots, i_n, 0}}(u)$, then define for every $V\in\mathfrak V$ $$ \kappa(u,V) = \inf_{n} \sum_{(i_1, \dots, i_n)\in \{0,1\}^n} \kappa(u, V_{i_1, \dots, i_n}) [V_{i_1, \dots, i_n} \cap V \neq \emptyset] $$ and note how $\mu(\cdot\times\mathcal V)$-a.e. we have $\kappa(u,V) = g^V(u)$ (the expression with the infimum allows us to have the measurability of $\kappa(\cdot, V)$ jointly for every measurable $V$). This can be rewritten $$ \kappa(u,V) = \inf_n \lim_k \sum_{(i_1, \dots, i_n)\in \{0,1\}^n} \tilde f_k^{V_{i_1, \dots, i_n}}(u) [V_{i_1, \dots, i_n} \cap V \neq \emptyset] $$ where $\tilde f_k^{V_{i_1, \dots, i_{n-1}, 0}} = f_k^{V_{i_1, \dots, i_{n-1}, 0}}$ and $\tilde f_k^{V_{i_1, \dots, i_{n-1}, 1}} = f_k^{V_{i_1, \dots, i_{n-1}}} - f_k^{V_{i_1, \dots, i_{n-1}, 0}}$.

Answer.

Let $(\mathcal X, \mathfrak X)$ and $(\mathcal Y, \mathfrak Y)$ be measurable spaces, and let $\mathfrak R$ be the Borel sigma-algebra of $\mathbb R$. We assume that $\mathcal X$ and $\mathcal Y$ come with a distinguished point $x$ and $y$ respectively. I assume that $\mathcal X$ has a net $(A_{i_1, \dots, i_n})$ and $\mathcal Y$ has a net $(B_{i_1, \dots, i_n})$, that generate $\mathfrak X$ and $\mathfrak Y$ respectively. We clearly have such a net for $\mathbb R$, see footnote 20, and I will denote the elements of this net by $C_{i_1, \dots, i_n}$. Clearly, the sets $D_{i_1, \dots, i_{2n}} = A_{i_1, i_2, \dots, i_{n}} \times C_{i_1, i_2, \dots, i_{n}}$ and $D_{i_1, \dots, i_{2n+1}} = A_{i_1, i_2, \dots, i_{n+1}} \times C_{i_1, i_2, \dots, i_{n}}$ form a net of $\mathcal X \times \mathbb R$; a net of $\mathcal X \times \mathcal Y$ can similarly be constructed.

From this, we can construct the following Markov kernels:

  1. $\kappa(y, A) = \mathbb P((X,Z)\in A \ | \ Y=y)$ with $y\in Y$ and $A \in \mathfrak X \otimes \mathfrak R$,

  2. for every $y\in Y$, starting from the probability measure $\kappa(y, \cdot)$, the kernel $\kappa_y(x,A) = \kappa_y(Z \in A \ | \ X=x)$ for every $x\in X$ and $A\in\mathfrak R$.

Your question is: Is $(x,y) \mapsto \kappa_y(x, C)$ measurable for every measurable subset $C$ of $\mathbb R$? (This is equivalent to your formulation with $Z\in L^1$ by classical arguments of approximation of $L^1$ random variables by simple random variables.)

By construction, $$ \kappa_y(x,C) = \inf_n \lim_k \sum_{(i_1, \dots, i_n)\in \{0,1\}^n} \tilde f_k^{C_{i_1, \dots, i_n}}(x,y) [C_{i_1, \dots, i_n} \cap C \neq \emptyset] , \qquad (1) $$ where for every $C \in \mathfrak R$ $$ f^C_k(\cdot, y) : x \in A_{j_1, \dots, j_k} \mapsto \begin{cases} \frac{\kappa(y, A_{j_1, \dots, j_k} \times C)}{\kappa(y, A_{j_1, \dots, j_k} \times \mathbb R)} &\text{ if } \kappa(y, A_{j_1, \dots, j_k} \times \mathbb R)>0 , \\ [0\in C] &\text{ otherwise} \end{cases} $$ and the $\tilde f_k^{C_{i_1, \dots, i_n}}(\cdot, y)$ are defined from the $f_k^{C_{i_1, \dots, i_n}}(\cdot, y)$ as before, so that they are $\mathfrak X \otimes \mathfrak Y$-measurable by the measurability of $\kappa(\cdot, D)$ for every $D\in\mathfrak X \otimes \mathfrak R$. From $(1)$ we conclude that $(x,y) \mapsto \kappa_y(x, C)$ is $\mathfrak X \otimes \mathfrak Y$-measurable for every $C\in\mathfrak R$, finishing the proof.

  • How strong is the assumption that these nets exist? – Felix Benning Mar 10 '25 at 18:41
  • @FelixBenning This assumption holds for every $\mathbb R^d$. More generally it holds for second-countable locally compact Hausdorff spaces, and more generally (see https://math.stackexchange.com/questions/168446/are-polish-space-and-lccb-space-related) for Polish spaces which are the bread and butter of probability theory. So I believe it is a pretty weak assumption. – Thomas Lehéricy Mar 10 '25 at 18:51
  • @FelixBenning concerning your bounty: this method with nets allows you to define $\mathbb E[ \ \cdot \ | \ X=x, Y=y]$ directly, instead of going through a sequential conditioning. I guess you want to use OP's question in the proof that we can ensure $\mathbb P( C \ | \ X=x, Y=y) = \kappa_y(x,C)$, i.e. that the two approaches match? – Thomas Lehéricy Mar 10 '25 at 18:58
  • To check if I understood this correctly: In essence it is possible to define conditional distributions for all $x,y$ this way instead of only for almost surely all. This way we can ensure that for every fixed $y$ the marginal is indeed a conditional expectation of the regular kernel. – Felix Benning Mar 11 '25 at 16:02
  • @FelixBenning Correct. The fact that the Radon–Nikodym derivative is defined only a.e. is not an issue either, since if we define $f$ with a $\limsup$, then our precaution of using $\tilde f$ instead of $f$ ensures that our $\kappa$ has the right properties always, and not just "a.e.". – Thomas Lehéricy Mar 11 '25 at 16:42
  • So I am trying to follow up on this proof and I am unsure what this "elegant martingale argument" is – Felix Benning Mar 18 '25 at 10:01
  • @FelixBenning The idea is to construct a sequence $I_1, I_2, \dots$ such that $\mathbb P(I_{n+1}=i_{n+1} \ | \ I_1=i_1, \dots, I_n=i_n) = \frac{\mu(U_{i_1, \dots, i_{n+1}} \times \mathcal V)}{\mu(U_{i_1, \dots, i_n} \times \mathcal V)}$. Then $(\mu(U_{I_1, \dots, I_n} \times V) / \mu(U_{I_1, \dots, I_{n}} \times \mathcal V)){n\geq 0}$ is a martingale with values in $[0,1]$, hence it converges almost surely and in $L^p$ for every $p\in [1,\infty)$. The limit is measurable with respect to $X = \cap_n U{I_1, \dots, I_n}$ (which is a singleton), hence is a function of that: the density. – Thomas Lehéricy Mar 19 '25 at 11:18
0

Here's my best attempt at setting things up rigorously (to clear up a potential confusion in the comments). I don't claim this is even a good perspective, but at the very least, it seems rigorous.

Following the definition/notation of regular conditional probability here, we have

  • $(X,Z): (\Omega \times \Omega, \mathscr F \otimes \mathscr F) \to (\mathcal X \times \mathbb R, \mathscr G:= \mathscr X \otimes \mathscr B(\mathbb R))$

  • Let $\mathscr D:= \sigma(Y) \subseteq \mathscr F$

  • Then for all fixed $G_0\in \mathscr G:= \mathscr X \otimes \mathscr B(\mathbb R)$, the function $\omega \mapsto \text{Pr}_{(X,Z)}(G_0 \mid \mathscr D)(\omega)$ is a version of the conditional probability $[\omega \mapsto \text{Pr}((X,Z)\in G_0 \mid \mathscr D)(\omega)] : \Omega \mapsto [0,1]$, which is in particular $\mathscr D=\sigma(Y)$-measurable, meaning it's a measurable function of $Y$, i.e. there is some measurable function $\rho_{G_0}: (\mathcal Y, \mathscr Y) \to ([0,1], \mathscr B([0,1]))$ s.t.

    $$[\omega \mapsto \rho_{G_0}(Y(\omega))]=[\omega \mapsto \text{Pr}((X,Z)\in G_0 \mid \mathscr D)(\omega)] : \Omega \to [0,1].$$

    And for every $\omega_0 \in \Omega \leadsto y_0 := Y(\omega_0)\in \mathcal Y$, the set map $[G \mapsto \rho(G,y_0):= \rho_G(y_0)]: \mathscr G \to [0,1]$ is a probability measure on the measurable space $(\mathcal X\times \mathbb R, \mathscr G)$.

So now we have $[(G,y) \mapsto \rho(G,y)]: (\mathscr G \times \mathcal Y)\to [0,1]$ (I guess for $y \notin \text{im}(Y:\Omega \to \mathcal Y) \subseteq \mathcal Y$, we just set $\rho(G,y)\equiv 0$?).

Now we fix $y_0\in \mathcal Y$. Our GOAL "The conditional expectation of $Z$ given $X=x$ now that we know $(X,Z)$ being a random value of $(\mathcal X \times \mathbb R)$ drawn according to the probability measure $G \mapsto \rho(G,y_0)$ on $\mathscr G$" is mathematically expressed as follows: from say this MSE question, we can cook up some new random variable $W^{y_0}=(W_1^{y_0},W_2^{y_0})$ on the probability space $(\Omega^{y_0}, \mathscr F^{y_0}, \text{Pr}^{y_0})$ taking values in $(\mathcal X \times \mathbb R)$ drawn according to the probability measure $G \mapsto \rho(G,y_0)$.

Then, our GOAL becomes studying the conditional expectation $\mathbb E(W_2 \mid W_1)$ (on the probability space $(\Omega^{y_0}, \mathscr F^{y_0}, \text{Pr}^{y_0})$), which is $\sigma(W_1^{y_0})$-measurable, and can be expressed as a mesaruable function of $W_1^{y_0}$, say $\varrho(W_1^{y_0})$. Then $\mathbb E(W_2 \mid W_1=x) = \varrho^{y_0}(x)$, mapping from $x\in \mathcal X \to \mathbb R$.

Your question is then whether $[(x,y) \mapsto h(x,y)]: (\mathcal X \times \mathcal Y, \mathscr X \otimes \mathscr Y) \to (\mathbb R, \mathscr B(\mathbb R))$ is measurable, for $h(x,y):= \varrho^y(x)$.

I think this is a correct formulation of the problem? It does seem quite difficult to prove though... let me post this for now and continue to think about it.

D.R.
  • 10,556
  • We can consider in $\Omega$, apart from $\mathbb{P}$, other probabilities such as $\mathbb{P}_y(A)=\mathbb{P}(A|Y=y)$. What the O.P. essentially asks is whether the conditional expectation $\mathbb{E}_y[Z\mid X=x]$ in that space is measurable. – Speltzu Mar 08 '25 at 08:17
  • Yes, this is the correct formulation of the problem – Felix Benning Mar 08 '25 at 13:13
-3

In general for regular conditional probabilities we have:\begin{gather*} P_{X\circ Y\circ Z}(A\times B\times C)=\int_B dP_Y.\int_{A\times C}dP_{(X\circ Z)/Y}=\int_B dP_Y.\int_{A}dP_{X/Y}.\int_C dP_{Z/X/Y} \end{gather*} And also: \begin{gather*} P_{X\circ Y\circ Z}(A\times B\times C)=\int_{A\times B} dP_{X\circ Y}.\int_C dP_{Z/X\circ Y}=\int_B dP_Y.\int_{A}dP_{X/Y}.\int_C dP_{Z/X\circ Y} \end{gather*} So $P_{Z/X/Y}=P_{Z/X\circ Y}$ (for almost all $(x,y)$). And therefore $E_{\Omega/Y}[Z\mid X]=E_\Omega[Z\mid X,Y]$.

Speltzu
  • 801