10

I found the following very nice post yesterday which presented the conditional expectation in a way which I found intuitive;

Conditional expectation with respect to a $\sigma$-algebra.

I wonder if there is a way to see that $E(X\mid \mathcal{F}_n)(\omega)=\frac 1 {P(E_i)} \int_{E_i}X \, dP$ if $\omega \in E_i$, could be regarded a Radon-Nikodym derivative. I cant formally connect the dots with respect to for example to the Wikipedia discussion,

https://en.wikipedia.org/wiki/Conditional_expectation.

I am missing the part where the measure gets "weighted" i.e somthing analogous to $\frac 1 {P(E_i)}$, in the Wikipedia article.

Update

It just hit me that if one divides the defining relation by the measure of the set the one has

$\frac{1}{P(E_{i})} \int_{E_i}X \, dP=\frac{1}{P(E_{i})} \int_{E_i}E(X\mid \mathcal{F}_n) \, dP$ for all $E_{i}\in \mathcal{F}_{n}$. This looks like we have a function which agrees with the avarages of $X$ a.e on every set of the algebra $\mathcal{F}_{n}$. This is not quite what @martini writes but maybe it is a reansable way to look at it aswell? This looks like somthing which would fit into the wikipedia disussion better tho. But it donst sit right on some other occasions.

So the question remains,

How do I think about this in the right way? If the second way is wrong, then how is the first consistant with the Wikipedia article?

My comment of Sangchuls answer will also be of help when understadning my troubles!

2 Answers2

9

Your intuition and formula makes sense when each $E_i$ is an elementary event in ${\cal F} $ which can not be decomposed into 2 disjoint events, both of positive probability. If it can be decomposed then there is a mis-match as in general $E(X|{\cal F})(\omega)$ is not constant over $\omega\in E_i$ while your formula is.

Suppose that $\Omega$ may be partitioned into a countable family of disjoint measurable events $E_i$, $i\geq 1$. It suffices to keep only the events with strictly positive probability, as they will carry the total probability. The $\sigma$-algebra ${\cal F}$ generated by this partition simply consists of all unions of elements in this family. A measurable function w.r.t ${\cal F}$ is precisely a linear combination of $\chi_{E_i}$, the characteristic functions on our disjoint family of events. We may thus write: $$ E(X|{\cal F}) (\omega) = \sum_j c_j \chi_{E_j}(\omega)$$ The constants may be computed from the fact that $\int_{E_i} E(X|{\cal F}) dP = c_i P(E_i) = \int_{E_i} X\; dP$. We get: $$ E(X|{\cal F}) (\omega) = \sum_j\chi_{E_j}(\omega) \frac{1}{P(E_j)} \int_{E_j} X\; dP $$ corresponding to the formula you mentioned. By writing down the defining equation you see that this indeed is the Radon-Nykodym derivative of $\nu(E)=\int_E X \; dP$, $E\in {\cal F}$ with respect to $P_{|{\cal F}}$.

Conditional expectation, however, becomes less intuitive when ${\cal F}$ is no longer generated by a countable partition, although sometimes you may find a tweak to get around. Example: Let $P$ be a probability on ${\Bbb R}$ having density wrt Lebesgue $f\in L^1({\Bbb R})$, $dP(x) = f(x) dx$.

We will consider a sub-$\sigma$-algebra generated by symmetric subsets of the Borel $\sigma$-algebra. Thus $A\in {\cal F}$ iff $x\in A \Leftrightarrow -x\in A$.

A measurable function wrt ${\cal F}$ is now any function which is symmetric, i.e. $Y(x)=Y(-x)$ for all $x$. This time an elementary event consists of a symmetric couple $\{x,-x\}$ which has zero probability. And you can not throw all these away when calculating conditional expectation. So going back to the definition, given an $X\in L^1(dP)$ you need to find a symmetric integrable function $Y$ so that for any measurable $I\subset (0,+\infty)$ you have: $$ \int_{I\cup (-I)} Y\; dP = \int_{I\cup (-I)} X \; dP $$ Using that $Y$ is symmetric and a change of variables this becomes: $$ \int_I Y(x) (f(x)+f(-x)) \; dx = \int_I (X(x) f(x) + X(-x) f(-x)) \; dx $$ On the set $\Lambda = \{ x\in {\Bbb R} : f(x)+f(-x)>0 \}$ which has full probability we may then solve this by defining: $$ Y(x) = \frac{X(x) f(x) + X(-x) f(-x) }{f(x) + f(-x) }, \; x\in \Lambda. $$ On the complement $Y$ is not defined but the complement has zero probability. $Y$ is then symmetric and has the same expectation as $X$ on symmetric events. Again $Y(x)$ is the Radon-Nikodym derivative of $\nu(E) = \int_E X \; dP$ wrt $P(E)$ with $E\in {\cal F}$.

Our luck here is that there is a simple symmetry, i.e. $x\mapsto -x$, describing the events in ${\cal F}$ and that the probability measure transforms nicely under this symmetry. In more general situations you may not be able to describe $E(X|{\cal F})$ explicitly in terms of values of $X$ and you are stuck with just the defining properties for conditional expectation [which, on the other hand, may suffice for whatever computation you need to carry out].

H. H. Rugh
  • 35,992
  • Very nice answer, I did however award to bouty to the other answer by accident. I reported it to the admins to see if they can do something about it. I am very sorry. –  Sep 13 '17 at 12:47
  • Don't worry. Just glad if the answer may help you. – H. H. Rugh Sep 13 '17 at 13:24
2

Assume that $X \geq 0$. Then $\nu(E) = \int_{E} X \, d\mathbb{P}$ defined a measure on the probability space $(\Omega, \mathcal{F}, \mathbb{P})$. Of course, its Radon-Nikodym derivative is simply $d\nu/d\mathbb{P} = X$.

Now we restrict $\nu$ to the space space $(\Omega, \mathcal{G}, \mathbb{P}|_{\mathcal{G}})$ where $\mathcal{G} \subset \mathcal{F}$ is a $\sigma$-subalgebra. Then the Radon-Nikodym derivative is

$$ \frac{d\nu|_{\mathcal{G}}}{d\mathbb{P}|_{\mathcal{G}}} = \mathbb{E}[X \mid \mathcal{G}]. $$

This is just a rephasal of the definition of conditional expectation $\mathbb{E}[X\mid \mathcal{G}]$, which is defined as the $\mathcal{G}$-measurable random variable $\tilde{X}$ satisfying

$$ \forall E \in \mathcal{G}: \quad \int_{E} X \,d \mathbb{P} = \int_{E} \tilde{X} \,d \mathbb{P}. $$

Sangchul Lee
  • 181,930
  • right, what I am looking for is a clear motivation why $\frac 1 {P(E_i)} \int_{E_i}X , dP=\frac{d\nu}{d\mathbb{P}}$. Just restritricing ourselfs to $\mathcal{G}$ mearly give us a measure on a smaller algebra but dont we also want to assign "larger" probabilties to the events aswell? i.e compensating for the conditioning such as our $\frac{1}{P(E_{i})}$. –  Sep 06 '17 at 04:33
  • @user21312, Conditional expectation can be understood as the 'best guess' for $X$ when only partial information (such as $\mathcal{G})$ is given. And the result is that $\mathbb{E}[X\mid\mathcal{G}]$ appears to be 'averaged out' over part of information that is not available to you. The extreme case is the usual expectation $\mathbb{E}[X] = \mathbb{E}[X \mid {\varnothing,\Omega}]$ where no information is available. In your case $\mathcal{G}=\sigma(E_i)$ (where ${E_i}$ is a partition of $\Omega$), $X$ is averaged out over each partial information $E_i$. – Sangchul Lee Sep 07 '17 at 12:17
  • right, analogous to the "elementary conditonal". –  Sep 07 '17 at 12:45