4

In the lecture notes by for High-Dimensional Probability by Handel, the following is affirmed:

Let $\mu$ and $\nu$ be probability measures, then

$$\mathcal C(\mu,\nu) = \{ \text{Law} (X,Y) : X\sim \mu, Y\sim \nu \} $$

Therefore, any $\pi \in \mathcal C (\mu,\nu)$ is called a coupling of $\mu$ and $\nu$.

Hence, the author claims that $$E_\mu f- E_\nu f= E_\pi [f(X) - f(Y)]$$

my question is how to prove the claim above. At first it seemed easy, but I’m getting confused on how to prove this rigorously.

I imagine that there is some kind of abuse of notation, since, for example, $E_\mu f = \int_\mathbb R f(w) d\mu$ and $E_\pi f(X) = \int_{\mathbb R^2} f(X((w,z)))d\pi$. But since $X:\Omega \rightarrow \mathbb R$, then $X((w,z))$ is ill defined.

3 Answers3

1

Suppose that $X$ and $Y$ take values in $\mathcal{X}$ and $\mathcal{Y}$, respectively. Since $\mu$ and $\nu$ are the marginal measures corresponding to $\pi$ (a prob. distribution on $\mathcal{X}\times\mathcal{Y}$), \begin{align} \mathsf{E}_\pi f(X)&=\int_{\mathcal{X}\times\mathcal{Y}} f(x)\pi(d(x,y))=\int_\mathcal{X} f(x)\int_\mathcal{Y} \pi(d(x,y)) \\ &=\int_\mathcal{X}f(x)\mu(dx)=\mathsf{E}_\mu f(X). \end{align} Similarly, $\mathsf{E}_\pi f(Y)=\mathsf{E}_\nu f(Y)$.

1

Note that $E_{\pi} [f(X)-f(Y)] = \int f(x) -f(y)d\pi(x,y) = \int f(x)d\pi(x,y) -\int f(y)d\pi(x,y)$ and $$\int f(x)d\pi(x,y) = \int f\circ P(x,y)d\pi(x,y) = \int f(z) d(P_*\pi)(z) = \int f(z) d\mu(z) $$

where $P$ is the mapping $(x,y)\mapsto x$ and $P_*\pi$ denotes the pushforward of $\pi$ by $P$.

Gabriel Romon
  • 36,881
1

The other answers have given formal proofs, so let me add:

One intuition for a coupling is that if you only observe $X$ it will look exactly like a sample from $\mu$, and similarly for $Y$. That is, a sample of $\pi_1( ( X,Y))$ ($\pi_1$ is projection on to first factor) sampled from $\pi$ follows the same distribution as a sample $X$ under $\mu$, and similarly for $Y$. This is more than an intuition, it's precisely the definition.

So $\mathbb{E}_{\pi}(f(X)) = \mathbb{E}_{\mu}(f)$ precisely because the distribution of $X$ under $\pi$ is $\mu$.

There's no abuse of notation. Generally in probability theory one is flexible with the measure space behind a a random variable, but in this case $X$ and $Y$ are both defined on the same measure space, with measure $\pi$, so that the distribution of these random variables is $\mu, \nu$ respectively. The coupling refers to the choice of the joint distribution of $(X,Y)$.

It may help to think through some examples here. For instance, can you describe all of the possible distributions of a coupling of two independent coin flips?

Elle Najt
  • 21,422
  • Thanks for your answer! I think I understand the intuition, but I’m having some trouble with the formalism. For example, why can I say that $\int f(X) d\pi = \int f(x) d\pi$? Sorry if this is a silly question. The thing is, I’m the type of person that needs to see all the symbolic manipulations. – Davi Barreira Sep 17 '20 at 18:22
  • @DaviBarreira The correct statement is $\int f(X) d \pi = \int f(x) d\mu(x)$. You can say this because the distribution of $X : \mathbb{R}^2 \to \mathbb{R}$, where $\pi$ is the measure on $\mathbb{R}^2$, is $\mu$. In general, if $X$ is a random variable with distribution $\mu$, then $\mathbb{E}( f(X)) = \int f(x) d\mu(x)$. This is : https://en.wikipedia.org/wiki/Law_of_the_unconscious_statistician – Elle Najt Sep 17 '20 at 18:39
  • I see. My doubt came because of the answer by Romon, which uses this equality I asked about. – Davi Barreira Sep 17 '20 at 18:42
  • @DaviBarreira Romon's answer is correct. I thought you were writing $d\pi(x)$ (which would be kind of confusing, but interpretable), but $\int f(x) d \pi(x,y)$ is fine. It may help to think about $f$ as a function of $(x,y)$, but one which is just constant in the $y$ variable so that variable is suppressed from the notation. Equivalently, you can think of $x$ in $f(x)$ as the function defined by $x(x,y) = P(x,y) = x$, which a common abuse of notation.

    (IMO the calculus notation can be confusing. It's much cleaner to think about measures and random variables whenever possible.)

    – Elle Najt Sep 17 '20 at 18:56
  • 1
    You can think of it this way: $f$ is a function which takes in $1$ value. There are many ways we can think of it as a function on $\mathbb{R}^2$ - for any way to assign a number to each point in the plane, we can plug in that number to $f$. Writing $f(x)$ makes explicit that we are making it into a function on $\mathbb{R}^2$ by defining the function $\tilde{f}(x,y) = f \circ P(x,y) = f(x)$ ($P$ as in Romon's answer). We could also have defined $f(y)$ similarly. You probably are familiar with this: the function $x^2$ can be thought of as a function on $\mathbb{R}^2$ as well as $\mathbb{R}$. – Elle Najt Sep 17 '20 at 19:18