Intuition behind Conditional Expectation

Question

I'm struggling with the concept of conditional expectation. First of all, if you have a link to any explanation that goes beyond showing that it is a generalization of elementary intuitive concepts, please let me know.

Let me get more specific. Let $\left(\Omega,\mathcal{A},P\right)$ be a probability space and $X$ an integrable real random variable defined on $(\Omega,\mathcal{A},P)$. Let $\mathcal{F}$ be a sub-$\sigma$-algebra of $\mathcal{A}$. Then $E[X|\mathcal{F}]$ is the a.s. unique random variable $Y$ such that $Y$ is $\mathcal{F}$-measurable and for any $A\in\mathcal{F}$, $E\left[X1_A\right]=E\left[Y1_A\right]$.

The common interpretation seems to be: "$E[X|\mathcal{F}]$ is the expectation of $X$ given the information of $\mathcal{F}$." I'm finding it hard to get any meaning from this sentence.

In elementary probability theory, expectation is a real number. So the sentence above makes me think of a real number instead of a random variable. This is reinforced by $E[X|\mathcal{F}]$ sometimes being called "conditional expected value". Is there some canonical way of getting real numbers out of $E[X|\mathcal{F}]$ that can be interpreted as elementary expected values of something?
In what way does $\mathcal{F}$ provide information? To know that some event occurred, is something I would call information, and I have a clear picture of conditional expectation in this case. To me $\mathcal{F}$ is not a piece of information, but rather a "complete" set of pieces of information one could possibly acquire in some way.

Maybe you say there is no real intuition behind this, $E[X|\mathcal{F}]$ is just what the definition says it is. But then, how does one see that a martingale is a model of a fair game? Surely, there must be some intuition behind that!

I hope you have got some impression of my misconceptions and can rectify them.

This is not the definition of conditional expectation with which I'm familiar. Do you have a reference? — Qiaochu Yuan, Feb 24 '11 at 20:51
@Qiaochu: I'm using Klenke's Probability Theory, but it's the same on Wikipedia. — Stefan, Feb 24 '11 at 21:19
You may want to read the answer to this question, http://math.stackexchange.com/questions/23093/could-someone-explain-conditional-independence/23100#23100, where user joriki explains what it means for event A to be conditionally dependent on event B. — Uticensis, Feb 24 '11 at 22:48
A wonderful explanation about conditional expectation can be found here https://www.ma.utexas.edu/users/gordanz/notes/conditional_expectation.pdf — Broken_Window, Aug 12 '16 at 14:08
For your first question consider the expected value of the years you have left to live given your current age. Or rather that same data for a random person you don't know. If it were a real (positive) number $N$ then you'd basically expect to live $N$ more years "forever". It must be a function of the given variable — John Cataldo, Apr 16 '19 at 14:51

score 176 · Answer 1 · answered Feb 24 '11 at 22:24

176

Maybe this simple example will help. I use it when I teach conditional expectation.

(1) The first step is to think of ${\mathbb E}(X)$ in a new way: as the best estimate for the value of a random variable $X$ in the absence of any information. To minimize the squared error $${\mathbb E}[(X-e)^2]={\mathbb E}[X^2-2eX+e^2]={\mathbb E}(X^2)-2e{\mathbb E}(X)+e^2,$$ we differentiate to obtain $2e-2{\mathbb E}(X)$, which is zero at $e={\mathbb E}(X)$.

For example, if I throw a fair die and you have to estimate its value $X$, according to the analysis above, your best bet is to guess ${\mathbb E}(X)=3.5$. On specific rolls of the die, this will be an over-estimate or an under-estimate, but in the long run it minimizes the mean square error.

(2) What happens if you do have additional information? Suppose that I tell you that $X$ is an even number. How should you modify your estimate to take this new information into account?

The mental process may go something like this: "Hmmm, the possible values were $\lbrace 1,2,3,4,5,6\rbrace$ but we have eliminated $1,3$ and $5$, so the remaining possibilities are $\lbrace 2,4,6\rbrace$. Since I have no other information, they should be considered equally likely and hence the revised expectation is $(2+4+6)/3=4$".

Similarly, if I were to tell you that $X$ is odd, your revised (conditional) expectation is 3.

(3) Now imagine that I will roll the die and I will tell you the parity of $X$; that is, I will tell you whether the die comes up odd or even. You should now see that a single numerical response cannot cover both cases. You would respond "3" if I tell you "$X$ is odd", while you would respond "4" if I tell you "$X$ is even". A single numerical response is not enough because the particular piece of information that I will give you is itself random. In fact, your response is necessarily a function of this particular piece of information. Mathematically, this is reflected in the requirement that ${\mathbb E}(X\ |\ {\cal F})$ must be $\cal F$ measurable.

I think this covers point 1 in your question, and tells you why a single real number is not sufficient. Also concerning point 2, you are correct in saying that the role of $\cal F$ in ${\mathbb E}(X\ |\ {\cal F})$ is not a single piece of information, but rather tells what possible specific pieces of (random) information may occur.

answered Feb 24 '11 at 22:24

13

Great answer, thank you. "In fact, your response is necessarily a function of this particular piece of information." But then, why is $E(X|\mathcal{F})$ defined on $\Omega$ and not on $\mathcal{F}$? This is central to my not understanding. If $\mathcal{F}$ is not generated by a disjoint family of sets, there is no such interpretation of $E(X|\mathcal{F})(\omega)$, is there? – Stefan Feb 24 '11 at 22:51
7

The conditional expectation $E(X|{\cal F})$ is characterized by being $\cal F$ measurable and having certain integrals over $\cal F$ sets. This function is therefore only defined $P$ almost everywhere. Therefore, you are right; the pointwise value $E(X|{\cal F})(\omega)$ has no particular meaning. – Feb 25 '11 at 15:02
3

Wonderful explanation. – Apr 20 '11 at 23:05
3

I want to add that if a r.v. $Z$ (defined on $\Omega$) is measurable with respect to $\mathcal F$, then one can indeed interpret $Z$ as a function of information of $\mathcal F$ in the following sense: if $\mathcal F$ represents the information for knowing outcome of another r.v. $Y$ (also defined on $\Omega$), i.e., if $\mathcal F = \sigma(Y)$, then $Z$ is a measurable function of $Y$, i.e., $Z = \phi(Y)$ (a.e.) for some $\phi$ where $\phi$ is some measurable function from the codomain of $Y$ (which would be $\mathbb R$ if $Y$ were a real-valued r.v.) to the codomain of $Z$. and... – Jisang Yoo Apr 26 '14 at 16:04
2

.. If $\mathcal F$ represents the information for knowing joint outcome of a (finite or countable) family of r.v.s $Y_i$ ($i \in I$), i.e., if $\mathcal F = \sigma(Y_i, i \in I)$, then $Z$ is a measurable function of $Y$, i.e., $Z = \phi(Y_i, i \in I)$ (a.e.) for some $\phi$ that is a measurable function from the product of codomains of $Y_i$ to the codomain of $Z$. – Jisang Yoo Apr 26 '14 at 16:04
1

thank you for explicitly pointing out that "the particular piece of information that I will give you is itself random." This is a great clarifying point! – Diego Sep 07 '16 at 17:35
@ByronSchmuland I have the same question. If the $\sigma$-algebra $\mathcal{G}$ is generated by disjoint sets, say, $B_i, i\in\mathbb{N}$, then $\mathbb{E}X \vert \mathcal{G}=\mathbb{E}[X\vert B_k]$ iff $\omega\in B_k$. Is it always possible to express a $\sigma$-algebra as the smallest $\sigma$-algebra generated by some disjoint sets? E.g., if $X_t$ is a continuous stochastic process and $\mathcal{F}_t$ the natural filtration, would $\mathcal{F}_t=\sigma({\omega:\omega(s)=f(s), s\in[0,t]}\vert f\in C([0,t],\mathbb{R})$? If not true rigorously, is the intuition right? – mathsquestion88 Jan 28 '17 at 12:46
In light of the comment by @NateEldredge and my previous comment. Given that my sub-$\sigma$-algebra is generated by disjoint sets whose union is $\Omega$. Among all the "yes/no questions" that I can ask there then is a set of mutually exclusive questions (on which all the other questions are built). On the other hand, if I start with a $\sigma$-algebra $\mathcal{G}$ and all the possible "yes/no questions" it implies, will I be able to identify mutually exclusive questions (i.e., disjoint events whose union is $\Omega$ and which generate $\mathcal{G}$)? – mathsquestion88 Jan 28 '17 at 13:05
Very nice perspective! – drhab Sep 05 '17 at 08:08
This goes well with @shai-covo 's answer below – Sam Weisenthal Dec 02 '18 at 14:37
1

"itself random. In fact, your response is necessarily a function of this particular piece of information. Mathematically, this is reflected in the requirement that $E(X|\mathcal{F})$ must be measurable." This seems to be the crux of the argument, but there is almost no detail given. Could someone please flesh this out? How exactly does this measurability condition reflect the randomness of the piece of information? – Tanishq Kumar Mar 06 '23 at 11:52

score 72 · Answer 2 · answered Feb 24 '11 at 22:55

72

I think a good way to answer question 2 is as follows.

I am performing an experiment, whose outcome can be described by an element $\omega$ of some set $\Omega$. I am not going to tell you the outcome, but I will allow you to ask certain questions yes/no questions about it. (This is like "20 questions", but infinite sequences of questions will be allowed, so it's really "$\aleph_0$ questions".) We can associate a yes/no question with the set $A \subset \Omega$ of outcomes for which the answer is "yes".

Now, one way to describe some collection of "information" is to consider all the questions which could be answered with that information. (For example, the 2010 Encyclopedia Britannica is a collection of information; it can answer the questions "Is the dodo extinct?" and "Is the elephant extinct?" but not the question "Did Justin Bieber win a 2011 Grammy?") This, then, would be a set $\mathcal{F} \subset 2^\Omega$.

If I know the answer to a question $A$, then I also know the answer to its negation, which corresponds to the set $A^c$ (e.g. "Is the dodo not-extinct?"). So any information that is enough to answer question $A$ is also enough to answer question $A^c$. Thus $\mathcal{F}$ should be closed under taking complements. Likewise, if I know the answer to questions $A,B$, I also know the answer to their disjunction $A \cup B$ ("Are either the dodo or the elephant extinct?"), so $\mathcal{F}$ must also be closed under (finite) unions. Countable unions require more of a stretch, but imagine asking an infinite sequence of questions "converging" on a final question. ("Can elephants live to be 90? Can they live to be 99? Can they live to be 99.9?" In the end, I know whether elephants can live to be 100.)

I think this gives some insight into why a $\sigma$-field can be thought of as a collection of information.

answered Feb 24 '11 at 22:55

Nate Eldredge

101,664

Great job on this answer! – Don Shanil Jul 11 '16 at 12:16
The Borel $\sigma$-field $\mathcal{B}$ contains every singleton ${\omega}$, so with the information in $\mathcal{B}$ we'd be able to answer every question $A\subseteq \mathbb{R}$ by asking the oracle whether $\omega\in{\omega}$. So we should add the caveat that we can only decide questions that we can understand. A Vitali set, for instance, is not intelligible. – Detached Laconian Sep 24 '18 at 21:59
@LaconianThinker: Yeah, in the case of $\mathbb{R}$, you can start by saying that you can ask whether $\omega < b$ for any $b$ (i.e. the set $(-\infty, b)$). This is a "computable" question in the sense that if you start computing decimal digits of $\omega$, you will eventually be able to tell whether it is less than $b$. And if you're able to ask a countable number of such questions, then you can determine whether $\omega$ is in your favorite Borel set $B$. But a countable number of such questions can never tell you whether $\omega$ is in the Vitali set. – Nate Eldredge Sep 24 '18 at 22:29
@NateEldredge I'm not sure you understand my point (maybe I don't, either). Naively, we'd say a set $A$ is measurable if, regardless of which $\omega$ was sampled, we can decide whether $\omega\in A$ by asking the oracle countably many questions of the form "is $\omega$ less than $b$? or "is $\omega$ equal to $b$?". (I know the second case is redundant, but it will simplify my explanation). (continued) – Detached Laconian Sep 24 '18 at 23:13
The problem is that any set (even a Vitali set!) would be measurable: if say, $\omega = 5$, we can ask the question "is $\omega$ equal to $5$?", and the answer ("yes") will give us complete information about $\omega$. Similarly for any other value $\omega$ could possibly have. – Detached Laconian Sep 24 '18 at 23:14
@LaconianThinker: You should think of the number $\omega$ as unknown. In other words, I am thinking of a number $\omega$ which I won't reveal directly, but I'll allow you to ask countably many questions of the form "is $\omega < b$". You're free to ask "is $\omega = b$" for as many (countably many) $b$ as you wish, but that will not let you determine the value of $\omega$, since most of the time I will truthfully answer "no" to all your questions, and at the end you still won't know what $\omega$ equals. – Nate Eldredge Sep 24 '18 at 23:38
@LaconianThinker: You can, with appropriate questions, determine with certainty whether it is in, say, the Cantor set. But you cannot determine whether it is in the Vitali set $V$. You could for instance choose countably many numbers $b_1, b_2, \dots \in A$ and ask me whether $\omega$ equals any of them. But if I say "no" to all of your questions, then your algorithm will have failed and you still won't know whether $\omega \in V$. – Nate Eldredge Sep 24 '18 at 23:40
So in order for $A$ to be Borel, you have to have a sequence of "$\omega < b$" questions (which is, more or less, a Borel code for $A$) whose answers will always reveal whether $\omega \in A$. It's not good enough if the answers only sometimes reveal the answer, and you can't design your questions with advance knowledge of what $\omega$ is. Sure, maybe you will get lucky and my number did turn out to be 5, but you can't guarantee that. – Nate Eldredge Sep 24 '18 at 23:46
Consider the sequence of questions "is $\omega < q_i$?" where $q_1, q_2, \ldots$ is an enumeration of all the rational numbers. Then we know $\omega$ exactly, regardless of what $\omega$ is, and with the same questions every time. – Detached Laconian Sep 25 '18 at 00:01
@LaconianThinker: Oh, now I see your point. Right. The issue is that you can't describe an algorithm to take the sequence of "yes" or "no" answers and reach a conclusion as to whether $\omega \in V$. (I guess "algorithm" here really means "Borel subset of $2^{\mathbb{N}}$" so perhaps I am being circular.) – Nate Eldredge Sep 25 '18 at 00:13
I know this is an old question, but the countable unions thing is still bothering me. Suppose that $\omega \notin \bigcup_{i \in \mathbb{N}} A_i$. Then we would need to test, for each $i$, that $\omega \notin A_i$. Clearly you cannot decide this in a finite number of steps.
Of course, you could just argue that this is an edge case where our intuition breaks down, and the closure of $\sigma$-algebras under countable union is just something needed to make the theory work right (e.g., it's needed to construct the Borel $\sigma$-algebra). Nevertheless that remains unsatisfying to me.
– MathematicsStudent1122 Oct 05 '21 at 07:50
(cont.) I think, if I might attempt to answer my own question, this situation of going from an algebra (finite unions) to a $\sigma$-algebra (countable unions) is analogous to that of a completion of a metric space, the canonical example of which is going from $\mathbb{Q}$ to $\mathbb{R}$. When passing from $\mathbb{Q}$ to $\mathbb{R}$, you give up some of the algorithmic intuition, simply due to the fact that algorithms mostly just work on inputs of finite length. Thus for instance, even something simple like deciding whether a given $x \in \mathbb{R}$ is equal to $0$ is undecidable. – MathematicsStudent1122 Oct 05 '21 at 09:11
(cont.) On the other hand, this price that you exact is mitigated by the fact that you can still approximate things algorithmically. In the case of countable unions of a $\sigma$-algebra, you describe this notion of approximation (or in your words "convergence") in your answer. Of course, when it comes to $\mathbb{R}$, we can approximate things by rationals. Thus while we have to give up some aspects of our intuition (after all, uncountable sets are arguably inherently unintuitive), the intuition still holds in the approximation, which is fine in practice. – MathematicsStudent1122 Oct 05 '21 at 09:16
(cont.) The ultimate benefits of this kind of completion in the construction of the theory ultimately vastly outweigh the cost. You can't really do analysis in $\mathbb{Q}$, and, analogously, many aspects of probability theory (e.g., Brownian motion) you likely do strongly require $\sigma$-algebras, rather than mere algebras. – MathematicsStudent1122 Oct 05 '21 at 09:20
@AlohaSine: Yes, this is a reasonable analogy. Going to countable unions is a sort of limiting / closure / completion step. By allowing finite unions of arbitrary number, we are "almost" to countable unions, so we just take that natural next step and close the gap. – Nate Eldredge Oct 05 '21 at 13:44
There has to be some way to formalize this intuition, no? Let $(\Omega, \mathcal{F}, P)$ be a probability space and $\mathscr{B} \subset \mathcal{F}$ a sub-$\sigma$-algebra countably generated by $(S_i){i \in \mathbb{N}}$. We have a sequence of questions $(q_i){i \in \mathbb{N}}$ which we can encode as indicator random variables $q_i = \mathbf{1}{S_i}$ . Write $Q{\mathscr{B}}:\Omega \to {0,1}^{\mathbb{N}}$ for the random sequence $(q_1, q_2, \cdots)$. Hence given $\omega$, $Q_{\mathscr{B}}(\omega)$ is the answer sequence to our questions $q_i$. – MathematicsStudent1122 Oct 10 '21 at 11:20
(cont.) Now, we can conjecture the following. For a random variable $Y$, the following are equivalent (i) $Y$ is $\mathscr{B}$-measurable (ii) $Y$ factors through $Q_{\mathscr{B}}$ i.e., $Y=f \circ Q_{\mathscr{B}}$ for some $f:{0,1}^{\mathbb{N}} \to \mathbb{R}$. I genuinely don’t know whether this is true. – MathematicsStudent1122 Oct 10 '21 at 11:21
@AlohaSine: It is true, if you add the requirement that $f$ be a Borel function. But it sort of becomes tautological, since the Borel $\sigma$-algebra on ${0,1}^{\mathbb{N}}$ is the $\sigma$-algebra countably generated by the sets ${x : x_i = 1}$. – Nate Eldredge Oct 10 '21 at 11:28
@NateEldredge Thank you. One more thing, we are assuming here that we can only ask countably many questions. This is a constraint. Restricting ourselves to countably generated $\sigma$-algebras can mitigate this, but still I was curious why it’s necessary to have this condition.
In another interpretation, one could imagine an agent asking “is $\omega \in S$?” for every $S$ in the sub-$\sigma$-algebra, which could be $\geq$ continuum queries.

Does the above “conjecture” I’ve written hold if we replace words like “countably, sequence, $\mathbb{N}$” with objects of arbitrary cardinal?
– MathematicsStudent1122 Oct 10 '21 at 11:59
@AlohaSine: It does, but for trivial reasons. If $\mathcal{B}$ is generated by uncountably many events $S_i$, where $i$ ranges over some index set $I$, then any event $A \in \mathcal{B}$ is actually in some sub-$\sigma$-algebra generated by some countable subcollection $S_{i_1}, S_{i_2}, \dots$. So although there are uncountably many questions that could be asked, determining whether any event occurred, or evaluating any random variable, only needs to ask countably many of them. [...] – Nate Eldredge Oct 10 '21 at 12:05
Similarly, a Borel function on ${0,1}^I$ (with respect to its product topology) is necessarily a function of only countably many of the coordinates. An agent who genuinely asks "is $\omega \in S$" for uncountably many $S$ is asking a question that isn't an event, because it won't be measurable. – Nate Eldredge Oct 10 '21 at 12:06

score 14 · Answer 3 · edited Apr 13 '17 at 12:58

14

An example. Suppose that $X \sim {\rm binomial}(m,p)$ and $Y \sim {\rm binomial}(n,p)$ are independent ($0 < p < 1$). For any integer $0 \leq s \leq m+n$, it holds $$ {\rm E}[X|X + Y = s] = \frac{{m }}{{m + n }}s. $$ This means that $$ {\rm E}[X|X + Y] = \frac{{m }}{{m + n }}(X+Y). $$ Note that ${\rm E}[X|X + Y]$ is a random variable which is a function of $X+Y$.

Note that, in general, the conditional expectation of $X$ given $Z$, denoted ${\rm E}[X|Z]$, is defined as ${\rm E}[X|\sigma(Z)]$, where $\sigma(Z)$ is the $\sigma$-algebra generated by $Z$.

EDIT. In response to the OP's request, I note that the binomial distribution (which is discrete) plays no special role in the above example. For completely analogous results for the normal and gamma distributions (both are continuous) see this and this, respectively; for a substantial generalization, see this.

edited Apr 13 '17 at 12:58

Community

1

answered Feb 24 '11 at 21:17

Shai Covo

24,383

@Shai: +1, nice answer! I was wondering whether the conditional expectation of a r.v. $X$ given another r.v. $Y$ is defined on the common domain $\Omega$ of both $X$ and $Y$, or on the codomain of $Y$, or can be either? The Wikipedia article says it is defined on the codomain of $Y$. See http://en.wikipedia.org/wiki/Conditional_expectation#Conditioning_as_factorization – Tim Feb 24 '11 at 22:10
@Tim: The conditional expectation is defined on the common domain $\Omega$. – Shai Covo Feb 24 '11 at 22:23
Thanks for your answer. In this example, there is a connection between conditional expectation and elementary expectation. But I fail to see how this generalizes a) to non-discrete random variables and b) to $\sigma$-algebras $\mathcal{F}$ which are not of the form $\sigma(Y)$ for any random variable $Y$. – Stefan Feb 24 '11 at 22:25
@Shai: Define $f$ by $E[X|Z]=f(Z)$. In your example $E[X|Z=s]=f(s)$. This is absurd if $P[Z=s]=0$. If you have something similar in that case, it would be helpful, though it probably won't answer all my questions. – Stefan Feb 24 '11 at 23:26
@Shai: I didn't see your edit before posting my comment. So is it correct that $E[X|Z=s]=f(s)$ defines $E[X|Z=s]$ with the terminology of my last comment? – Stefan Feb 24 '11 at 23:37
@Stefan: If ${\rm E}[X|Z=s]=f(s)$, then ${\rm E}[X|Z] = f(Z)$. Is this what you wanted to know? – Shai Covo Feb 24 '11 at 23:53
@Shai: I thought it was the other way round, or how would you define $E[X|Z=s]$ if $P[Z=s]=0$? – Stefan Feb 24 '11 at 23:56
@Stefan: Consider the following example. Suppose that $Z \sim {\rm exponential}(\lambda)$ and that $X$ is uniform$[0,Z]$. Then, clearly, ${\rm E}[X|Z=s] = s/2$. Hence, ${\rm E}[X|Z]=Z/2$. – Shai Covo Feb 25 '11 at 00:06
@Shai: Yes, but this is only an example. I'm pretty sure you can't define $E[X|Z=s]$ for arbitrary $X$ and $Z$. – Stefan Feb 25 '11 at 00:33
@Stefan: Indeed, this is only an example. But in practice, you often find expectations using ${\rm E}[X|Z=s]$, where ${\rm P}(Z=s)=0$. – Shai Covo Feb 25 '11 at 00:51
For example, consider the law of total expectation: ${\rm E}[X] = {\rm E}{\rm E}[X|Z]$. If $Z$ has density function $f_Z$, this gives ${\rm E}[X]=\int {E[X|Z = s]f_Z (s)ds}$. – Shai Covo Feb 25 '11 at 00:59
@Stefan: For the general case of conditional expectations, see, for example, http://www.math.uwaterloo.ca/~dlmcleis/book/appendixa2.pdf. – Shai Covo Feb 25 '11 at 01:10
@Shai:Thank you! You've been very helpful. – Stefan Feb 25 '11 at 01:30

Rasmus · Answer 4 · 2011-02-24T23:08:39.260

12

You can think of the conditional expectation as the orthogonal projection onto the closed subspace of $\mathcal F$-measurable random variables in the Hilbert space of square integrable random variables.

This is a detailed and elementary discussion of this viewpoint.

edited Feb 24 '11 at 23:08

answered Feb 24 '11 at 22:26

Rasmus

18,946

Tim · Answer 5 · 2011-02-24T22:16:33.947

I happened to read an article on Wikipedia today on Conditional Expectation. That clarified a lot of my questions. Hope it helps!

For your first question, in the linked article, there is the definition for conditional expectation of a r.v. $X: \Omega \rightarrow \mathbb{R}$ given a sub sigma algebra $\mathcal{F}$ of the one $\mathcal{A}$ on domain $\Omega$. It is a $\mathcal{F}$-measurable function $: \Omega \rightarrow \mathbb{R}$, denoted as $E(X \vert \mathcal{F})$. If you evaluate this conditional expectation at a point $\omega \in \Omega$, you will get a value $E(X \vert \mathcal{F})(\omega)$, which is called the conditional expectation of $X$ given $\mathcal{F}$ at $\omega$.

When the r.v. $X$ is an indicator function on some measurable subset say $A \in \mathcal{A}$, its conditional expectation given the sub sigma algebra is called the conditional probability of the subset $A$ given the sub sigma algebra $\mathcal{F}$, denoted as $P( A \vert \mathcal{F})$. It is a mapping: $\Omega \rightarrow \mathbb{R}$.

If we let $A$ vary within $\mathcal{A}$, the conditional probability $P( \cdot \vert \mathcal{F})$ is a mapping: $\mathcal{A} \times \Omega \rightarrow \mathbb{R}$. In some cases, $\forall \omega \in \Omega$, $P( \cdot \vert \mathcal{F})(\omega)$ is a probability measure on $(\Omega, \mathcal{F})$, in which case $P(\cdot \vert \mathcal{F})$ is called a regular conditional probability.

When $\mathcal{F}$ is generated by another r.v. $Y$, then the conditional expectation and conditional probability will be called the ones given the r.v. $Y$.
For your second question, I am still wondierng what kind of information a sigma algebra (of a r.v.) can provide, and how it is provided?

score 6 · Answer 6 · answered Sep 24 '20 at 20:49

I was only able to understand the notion of the conditional expectation with respect to a sub-$σ$-algebra $\mathcal F$, when I realized that this game is only interesting when $\mathcal F$ is "not Hausdorff", meaning that there might be points $x$ and $y$ which cannot be separated by an $\mathcal F$-measurable set. Any $\mathcal F$-measurable function must therefore coincide on $x$ and $y$, so $E(X|\mathcal F)$ tries to be the best photograph of the random variable $X$ which coincides on $x$ and $y$, as well as on any other similar pairs of points.

In the event that $\mathcal F$ is the smallest sub-$σ$-algebra possible, namely $\mathcal F = \{\emptyset, \Omega\}$, only constant functions are measurable. So $E(X|\mathcal F)$ must be a constant function, and that constant turns out to be the average of $X$, a.k.a. the expectation of $X$.

PS. This is a comment I made in a recent question (Refference for conditional expectation) which in turn brought me here to this 10 year old question when I clicked on a "Related" post. Reading the answers I did not find anyone referring to the above point of view, so I hope my little contribution will be useful to someone.

Tanizaki · Answer 7 · 2025-04-19T06:32:17.620

Before anything else, we should not forget about the point of making predictions in real life. Indeed, without the absence of information, we would have complete knowledge of an object; and hence, it would not make sense to speak of a prediction. For example, if you pay a predetermined amount of rent each month. One may compare this obvious fact with $E(X|\mathcal{F})=X$ if $\mathcal{F}$ represents the full sigma algebra, i.e perfect information.

However, how do we know whether it makes sense to predict a random variable $X$ in the context of mathematical probability? Well, you would have to examine what $X$ really represents, in particular, whether you can describe this rv as a function explicitly. For example, if we consider discrete probabilities on the outcomes of a dice throw, you could write down $X(i)=i$ and analyze how it behaves on each of the subsets of $\{1,2,...,6\}$. The situation becomes much harder if we consider rvs given implicitly in terms of other rvs (such as time). In what follows, we consider the classical example of a sum of a random number of random variables (which naturally occurs in branching processes).

Let $Y_1,Y_2,...$ be iid rvs and rv $N\in\mathbb{N}$ independent of the $Y_i$, and let's consider the random sum $X=Y_1+...+Y_N$. Okay, why is this implicit? Well, because there is no single representative which maps to an outcome of $w\in\Omega$; we could easily have $Y_1(w)=Y_1(w)+Y_2(w)+Y_3(w)$, and so, deciding when $X\in B$ for Borel $B$, is borderline impossible (unless of course, if you check where each $w\in\Omega$ lands, but this goes against the spirit of probability itself). Again, if you could separate the outcomes of $X$ for different outcomes of $N$ our life would become easy since we would just have to analyze the functions $G_k=Y_1+...+Y_k$ for a fixed integer $k$; however, by the above observation, this is not the case in our problem.

Then, what can we do? We normally like to obtain metrics such as $EX$ or $VarX$ but due to the complexity of X, there is no telling how to compute these values. As "good" architects, let's try to test the age-old wisdom of breaking a complex building into smaller blocks:

We don't know $X$ for sure, except on sections $\{N=n\}$ (think of these as partial knowledge/information). What if we came up with something we know for sure (all possible sections of this something should be given by the partial information we already know)? How can we achieve this (i)? Apart from that we would like this something to resemble $X$ at least on the sections $\{N=n\}$ (ii). If we achieved both, we would have obtained the best guess of $X$.

The question (i) motivates the measurability of our best guess $E(X|\mathcal{F})$ with respect to the sections. That is, our partial knowledge of $X$ becomes the definite knowledge. For this purpose we take sub-sigma algebra generated by N, $\mathcal{F}=\sigma(N)$. And, (ii) motivates the fact that the best guess and $X$ have to average out to the same number on the sections. As an important remark, it is somewhat a miracle that only these two properties provide existence of a well-defined object (by Radon Nikodym). This is why, a priori, posing questions (i) and (ii) in conjunction makes sense.

Having motivated the definition of conditional expectation, let's now see how it can be applied (elegantly!) to our problem. Assume $EN^2,EY^2<\infty$, $EY_1=\mu$, $VarY_1=\sigma^2$. With minimal effort one may prove the identity: $$Var(X)=E(Var(X|\mathcal{F}))+Var(E(X|\mathcal{F}))$$ where $Var(X|\mathcal{F})=E(X^2|\mathcal{F})-E(X|\mathcal{F})^2$. Choosing $\mathcal{F}$ as above allows you to conclude that on each $N_i=\{N=i\}$, $E(X|\mathcal{F})$ is $Y_1+...+Y_i$ (check the conditions hold!). Now, combining the blocks, it's an easy exercise to compute $E(E(X|\mathcal{F}))$ and verify that $VarX=\sigma^2EN+\mu^2VarN$, a variant of Wald's identity you might have mysteriously came upon in earlier courses!!

Hosein Rahnama · Answer 8 · 2025-06-25T19:37:31.870

I enjoyed the answer of user940. To add upon the viewpoint of interpreting expectation as the best predictor, notice that if we would like to find a function $f:\mathbb{R} \to \mathbb{R}$ that minimizes the square error $\mathbb{E}((X - f(Y))^2)$, then the answer is $f(Y) = \mathbb{E}(X|Y)$. This is like trying to have the best guess for $X$, given the information from observing $Y$. However, this information is random itself and consequently our guess relies on it, which means that our guess is also random!

Intuition behind Conditional Expectation

8 Answers8

Linked

Related