The good, the bad and the ugly with conditional probability/expectation

Question

I thought that I understand conditional probability and expectation until I saw this question:

The problem for conditional expectation.

Basically, it is given that: $$(X,Y)\sim f(x,y)=\begin{cases} 2xy &\text{ if $0<x<2y<2$} \\ 0 &\text{ otherwise } \end{cases}$$ And it is asked to find $E[Y|X=aY]$

Background: The good

I understand Method 2, where one would write:

$E[Y|X=aY]=E\left[Y|\frac{X}{Y}=a\right]=E[Y]=\frac{4}{5}$

$\frac{X}{Y}=a$ is dropped from expectation after proving that $Y$ and $\frac{X}{Y}$ are independent using the transformation $(X,Y)\to(X/Y,Y)$

I also (kind of) understand Method 1 in the answer, with conditioning on $Y$ (although my intuition tells me the result is correct only because $Y$ and $X/Y$ are independent):

$$E[Y|X=aY]=\int_0^1E[Y|X=aY,Y=y]f_Y(y)\,dy=\int_0^1yf_Y(y)\,dy=E[Y]=\frac{4}{5}$$

But when I try my own intuitive approaches, I'm getting stuck.

Question 1: The Bad

I interpret the conditional pdf $f_{Y|X}(y|t)=Cf_{Y,X}(y,t)$ as sectioning the joint pdf surface with the plane $y=t$ and scaling the resulting curve to a pdf.

My intuition tells me that the conditional pdf of $Y$ given $X=aY$ should similarly be found by sectioning the joint pdf with the plane $x=ay$ and scaling to a pdf.

The pdf would be $Cf(ay,y)=Cay^2=3y^2, 0<y<1$, and the conditional expectation:

$E[Y|X=aY]=\int_0^13y^3\,dy=\frac{3}{4}$

What am I missing here?

Question 2: The Ugly

I'm trying now to do the same thing as in Method 1, but condition on $X$ rather than on $Y$:

$$E[Y|X=aY]=E[X/a|X=aY]=\frac{1}{a}E[X|X=aY]\\=\frac{1}{a}\int_0^aE[X|X=aY,X=x]f_X(x)\,dx=\frac{1}{a}\int_0^a xf_X(x)\,dx$$

Which is something very ugly depending on $a$, instead of $4/5$.

Again, many thanks for anybody who could point out the mistakes in my thinking.

You're conditioning on an event of probability zero, so you have to be more careful as to how you want to define it, and you can't necessarily use arbitrary rules of probability and expect them to be consistent with this situation. See https://en.wikipedia.org/wiki/Borel%E2%80%93Kolmogorov_paradox for a similar phenomenon. — Nate Eldredge, Jun 25 '21 at 18:31
@NateEldredge Could you please detail in an answer a general method of how to approach conditioning with probability zero? Also, is transformation method yielding $4/5$ the correct method/answer? Or there is more than one correct answer? — Kiomi, Jul 05 '21 at 02:02
@Kiomi: There is no general method, that's the whole point. There are specific methods that give answers useful in solving specific problems. But if you just ask "What is $E[Y \mid X=aY]$" without context, it isn't answerable. It's like asking for a general method of dividing by zero. — Nate Eldredge, Jul 08 '21 at 03:26

Lorents · Answer 1 · 2023-03-14T14:54:48.480

The Good

I agree that $Y$ and $X/Y$ are independent. However, I do not see anywhere this is proven, so we'll start from the beginning with $E[Y|X=aY]$. Rephrased as $E[Y|X/Y=a]$, we have the random variable (r.v.) $Y$ conditioned upon the value of the r.v. $X/Y$, which we can call $U$. (This seemingly innocuous step is actually where the two suggested solutions diverge, as is explained in The Bad below.) To get a nice and clean full change of variables we also introduce $V=Y$. Now, $E[Y|X/Y=a] = E[V|U=a]$.

We are given the joint probability density function (pdf) $f_{X,Y}(x,y) = 2xy$ on $0<x<2y<2$. We consider the change of variables from $(x,y)$ to $(u,v)=(x/y,y)$, so that $(x,y)=(uv,v)$. The triangular region $0<x<2y<2$ becomes the rectangular region $0<u<2,\; 0<v<2$. Now, naïvely exchanging $(x,y)$ for $(uv,v)$ in $f_{X,Y}(x,y)=2xy$ gives the incorrect formula $f_{U,V}(u,v)=2uv^2$. Integrating over the support $0<u<2,\; 0<v<2$ (support means where $f_{U,V}$ is nonzero) shows that this is not even a pdf.

Solution by change of variables

Since $f_{X,Y}(x,y)$ is a pdf, it "needs" an integral to give an actual probability. To find the right expression for the joint pdf $f_{U,V}$ we should consider the above change of variables from $(x,y)$ to $(u,v)$ in some integral, e.g.

$$\iint_A f_{X,Y}(x,y) dxdy = \iint_A 2xydxdy \,,$$

where $A$ is some region in the $xy$-plane over which $f_{X,Y}(x,y)$ is nonzero.

The proper change of variables requires multiplication with the determinant of the Jacobian, $|\partial(x,y)/\partial(u,v)|$. Since $|\partial(x,y)/\partial(u,v)| = v$, we obtain

$$ \iint_A 2xydxdy = \iint_S 2uv^3dudv \,,$$

where $S$ is the area in the $uv$-plane corresponding to $A$.

We now see that $$f_{U,V}(u,v) = \begin{cases} 2uv^3 &\text{ if }\: u,v\in(0,2) \\ 0 &\textrm{ otherwise}\,.\end{cases}$$ From this, we can easily find the marginal distributions to be

$$f_U(u) = u/2\,, \quad f_V(v)=4v^3\,;\quad u,v\in(0,2)\,.$$

Since $f_{U,V}(u,v)$ factors into $f_U(u)f_V(v)$, the r.v:s $U$ and $V$ are independent, and thus

$$E[Y|X/Y =a] = E[V|U=a] = E[V] = \int_0^2 4v^4dv = \frac{4}{5}\,.$$

The Bad

Intuition and thin slices

You explain the very reasonable intuition that a conditional pdf $f_{Y|X}(y|x_0)$ is obtained by sectioning the joint pdf with the plane $x=x_0$ and scaling to get a pdf. We can formalize this argument by considering a very thin slice of the joint pdf given by $x\in(x_0,x_0+h)$ and then in a limiting argument let $h\to 0$.

The normalizing constant for $f_{X,Y}$ restricted to such a thin slice is $C=\int_{x_0}^{x_0+h} f_X(x)dx$. We obtain $f_{Y|X}(y\,|\,x\in(x_0,x_0 +h))$, which is almost what we want, by integrating $C^{-1}f_{X,Y}(x,y)$ over $x\in(x_0,x_0 +h)$:

$$f_{Y|X}(y\,|\,x\in(x_0,x_0 +h)) = \frac{\int_{x_0}^{x_0+h}f_{X,Y}(x,y)dx}{\int_{x_0}^{x_0+h} f_X(x)dx}\,.$$

Now, taking the limit of both sides as $h\to 0$ gives the familiar formula $$ f_{Y|X}(y|x_0) = \frac{f_{X,Y}(x_0,y)}{f_{X}(x_0)}\,,$$ (where a common factor $h$ has canceled between the numerator and the denominator). There might be some requirement of continuity to make this argument legal.

The paradox

However, things are subtle as we want to slice our pdf obliquely. How exactly do we arrive at the event $X=aY$, by some limit? Since $X/Y =a$ describes a line through the origin, we might argue that the reasonable slice to take out of our pdf does not have uniform thickness, but rather should be the the wedge $x/y \in (a,a+h)$. (Our change of variables from $(X,Y)$ to $(X/Y,Y)=(U,V)$ has this interpretation built into it: A nice thin slice in the $uv$-plane transforms to a wedge in the $xy$-plane.)

But no, you might say, "I actually want a slice of uniform thickness in the $xy$-plane. I obtain my condition $X=aY$ by carefully observing that $aY < X < aY + h$, for a very small $h$." Well, as you have seen, this leads to a different answer: $E[Y|X=aY] = 3/4$. We can formally obtain this answer by introducing the new r.v. $W=X-aY$ and then calculating $E[Y|W=0]$.

This is the Borel–Kolmogorov paradox. (See especially the quote "the term 'great circle' is ambiguous until we specify what limiting operation is to produce it".) It seems to me that it's even possible to cause trouble by interpreting the perfectly normal conditioning $Y|X=x$ in some silly way. "Who said I had to make nice slices? I chose wobbly ones." The answer to this is probably to define conditioning upon $X=x$ in the reasonable way.

Thus, $E[Y|U=a]$ is actually a different problem than $E[Y|W=0]$, and one could argue that both are valid interpretations of the problem $E[Y|X=aY]$. In the first interpretation, $a$ is a value obtained from the r.v. $X/Y$, while in the second it is just some number fixing an affine relationship between $X$ and $Y$. I think that the geometry of the problem makes the latter interpretation unnatural, at least for most values of $a$. However, you could have given us the problem $E[Y|X=2Y]$. Now we are talking ambiguity!

Solution by thin wedges

Deciding, for the second time, that $a$ comes from the r.v. $X/Y$, we want to try out a "thin wedge and limit"-argument and see how the calculations check out. Let us consider the wedge $ay<x<(a+h)y$ of maximum thickness $h$, and restrict our pdf $f_{X,Y}$ to this wedge. Our normalizing constant becomes

$$ \int_0^1 dy \int_{ay}^{ay+hy}2xydx = \ldots = \frac{ah}{2} + \frac{h^2}{4}\,.$$ Integrating out the $x$ from $f_{X,Y}(x,y)$, as before, gives

$$\int_{ay}^{ay+hy}2xydx = 2ahy^3 + h^2y^3\,.$$

Now,

$$f_{Y|\frac{X}{Y}}(y|a)= \lim_{h\to 0}\frac{2ahy^3 + h^2y^3}{\displaystyle\frac{ah}{2} + \frac{h^2}{4}}= 4y^3\,,$$ which is independent of $a$. We obtain

$$E[Y|X/Y=a]= \int_0^2yf_{Y|\frac{X}{Y}}(y|a)dy = \int_0^2 4y^4dy = \frac{4}{5}\,;$$ hooray!

The Ugly

You are correct that a possible solution starts with

$$ E[Y|X=aY] = \frac{1}{a}E[X|X=aY]\,.$$

The right hand side can be calculated by the change of variables to $(W,Z)=(X,X/Y)$ and the first solution method in this answer. It is very similar and just slightly more messy. The $a$:s cancel out.

You were also right in the beginning of your post that $E[Y|X=aY] = \int_0^1E[Y|X=aY,Y=y]f_Y(y)\,dy$ held true just because $Y$ and $X/Y$ were independent. ($E[Y|X=aY,Y=y]$ is then just a convoluted way of saying "$y$".) However, $X$ and $X/Y$ are not independent! (The support of $f_{X,X/Y}(x,x/y)=f_{W,Z}(w,z)$ is not even a rectangle, which is an absolutely necessary condition for independence.) That is why you got something ugly in the final two steps of The Ugly.