4

I'm quite confused by the notion of random variable in the proper measure-theoretic framework. Let's first state the notation and definitions:

Let $(\Omega, \Sigma, \operatorname{P})$ be a probability space. Then, a real-valued random variable is a measurable function $X \colon \Omega \to \mathbb{R}$ and its probability distribution is the pushforward measure $\operatorname{P}_{X} := \operatorname{P} \circ X^{-1}$. If $\operatorname{P}_{X}$ is absolutely continuous with respect to the Lebesgue measure $\lambda$ we also know that there is a probability density function $f\colon \mathbb{R} \to \mathbb{R}$ such that $\operatorname{P}_{X}(B) = \int_B f \, \mathrm{d} \lambda$ for $B \in \mathcal{B}(\mathbb{R})$ (by the Radon–Nikodym theorem).

Now let's see a simple example that is often used to illustrate the notion of random variable:

  1. Random variable that represents the sum of two dice. In this case $\Omega = \{1, 2, 3, 4, 5, 6\}^2$, $\Sigma = \mathcal{P}(\Omega)$, and $\operatorname{P}(A) = \frac{\#A}{36}$ for $A \in \Sigma$, $X \colon (\omega_1, \omega_2) \mapsto \omega_1 + \omega_2$ and e.g. $\operatorname{P}_X(3) = \operatorname{P}(\{(1, 2), (2, 1)\}) = \frac{1}{18}$.

This is all crystal clear but the two examples below break my little mind:

  1. Normal random variable. What is $(\Omega, \Sigma, \operatorname{P})$ now? Others have given the answer that the underlying probability space is just abstract and unspecified. But why then, is it necessary to use the notion of random variable in the first place here? Wouldn't it be easier just to say that we are working with a probability space with $\Omega = \mathbb{R}$, $\Sigma = \mathcal{B}(\mathbb{R})$, and $\operatorname{P}(A) = \int_A \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2}\, \mathrm{d}\lambda(x)$ for $A \in \Sigma$?
  2. Random variable that represents the outcome of the toss of a fair coin. As explained here, the underlying probability space is again some abstract space of all conceivable futures. But why do we even need that? Why not directly use $\Omega = \{0, 1\}$, $\Sigma = \mathcal{P}(\Omega)$, and $\operatorname{P}(A) = \frac{\#A}{2}$ for $A \in \Sigma$?

If it is indeed beneficial to introduce random variables in these two cases, what are the benefits?

  • 4
    The probability space for a normal random variable does not have to be abstract and unspecified. Take $\Omega=\mathbb R,,\Sigma$ the Borel $\sigma$-algebra and $P$ the measure with the normal density. The reason we allow more general probability spaces is that normal r.v.s occur in more general situations. Prime example: Brownian motion. – Kurt G. Sep 17 '23 at 11:22
  • 2
    This is a good question. I might summarise it as this; what does one gain by considering random variables rather than their distributions? In many theorems we don't need the whole "let $X$ be a random variable...", the theorem and proof is really entirely concerned with the distribution measure. I might say that having a theory that talks about random variables is useful because we generally like and need working with functions - for example, knowing how to compute the distribution of $X+Y$ is a problem that needs functions to be phrased properly – FShrike Sep 17 '23 at 11:36
  • But we also don't lose anything by talking about random variables. Given any distribution $\mu$ there is a probability space and suitable random variable with distribution $\mu$ – FShrike Sep 17 '23 at 11:38
  • You will need that when you start doing stochastic processes, for example. – user10354138 Sep 17 '23 at 12:41
  • 1
    It's kind of like how we could do discrete probability with just sums, but we introduce discrete measures and do it with integrals: we can write up the formulas and computations just once, and it applies everywhere. Phrasing everything with random variables does the same thing, even though we don't use the full power of r.v.s in every situation. – JonathanZ Sep 17 '23 at 13:29

2 Answers2

7

[updated with example below]

$\newcommand{\Cov}{\mathrm{Cov}}$

The real need for a notion of a random variable, as opposed to a distribution, comes because one wants have a single mathematical object that contains all of the information necessary to make statements or formulate questions about a given random quantity.

Suppose I want to ask whether real valued random variables $X$ and $Y$ are independent? Without random variables, I cannot answer this just using the distribution of $X$ and the distribution of $Y$. I instead need to appeal to another mathematical object, the joint distribution for $X$ and $Y$ on $\mathbb R\times \mathbb R$.

So (considering real quantities for the moment) every statement about a family of random quantities, say indexed by a set $S$, would need to first specify a joint distribution on the product space $\mathbb R^S$. Further statements involving other random quantities, say indexed by a set $T$ which might or might not intersect with $S$, would need to re-specify a new joint distribution, this time on $\mathbb R^T$, in such a way that was compatible with the distribution on $S$.

It becomes much simpler just to assume, once and for all, an underlying sample space, and then a random quantity has a precise formulation as a random variable, i.e., a measurable function on that sample space.

Example

Suppose we are flipping a fair coin twice, and recording the number of heads flipped as a Bernoulli variable for each as $X_1$, $X_2$. Suppose $X_3=1-X_1$ is defined as the number of tails flipped on the first trial, and likewise $X_4=1-X_2$. I can define these all in the obvious way as random variables on the sample space of outcomes $\Omega =\{HH,HT,TH,TT\}$, with probability measure $\mu(S)=\frac{\#S}{4}$.

Treating these as random variables on a sample space, I can define independence of $X_i$ and $X_j$ in terms of independence of the events $\{\omega\mid X_i(\omega)\leq x_i\}$ and $\{\omega\mid X_j(\omega)\leq x_j\}$ for all $x_i,x_j\in \mathbb R$, and from this definition, $X_1$ and $X_2$ are independent, while $X_1$ and $X_3$ are not, for example.

However, the distributions of all four $X_i$’s are identical, so there is no way to define independence in terms of their individual distributions. We would need to separately know the joint distribution for every pair in order to answer that question. Or we would need a single joint distribution on $\mathbb R^4$ from which we could derive the pair-wise distributions.

Note that the latter joint distribution on $\mathbb R^4$ would effectively function as an alternative sample space, with the projections onto each coordinate functioning as the given random variables. But it would be quite a bit more cumbersome to describe the joint distribution on four random variables, not all of which are independent. Moreover, suppose we wished to consider other random variables like $Y=\frac{X_1-X_2-X_3}{3}$? How would we easily define something like $\Cov(X_1,Y)$? Do we really want to now derive another joint distribution just for this?

M W
  • 10,329
  • @M W Would it be possible to include a simple example? I'm still a little confused. You don't need the joint distribution for $X$ and $Y$ to determine independence if you regard these quantities as measurable functions on an underlying sample space? – sleepingrabbit Sep 18 '23 at 10:35
  • @sleepingrabbit hopefully the example I added clarifies things a bit. In some sense you DO need the joint distribution for two random variables to determine independence, but the point is the joint distribution is well defined as the push forward of the measure $\mu$ from the sample space along the product map $(X_1,X_2)\colon \Omega\to \mathbb R^2$. So the random variables uniquely determine the joint distribution, whereas the separate distributions do not. – M W Sep 18 '23 at 20:59
0

There are basically two different questions here:

  1. Why do we use random variables?
  2. Why do we define random variables as functions on probability spaces?

In principle, we could have done everything in terms of sums over probabilities or integrals over probability densities. However, which of these two sides is easier to read: $$ \text{E}[XY]-\text{E}[X]\,\text{E}[Y] =\iint xy\,f(x,y)\,dx\,dy - \iint x\,f(x,y)\,dx\,dy\cdot\iint y\,f(x,y)\,dx\,dy $$ Or try writing out $\text{Var}[X] = \text{Var}[\text{E}[X|Y]]+\text{E}[\text{Var}[X|Y]]$ on integral form.

Even if we hadn't been using the probability space formalism, we would still want to introduce the random variable notation, if nothing else than as a short-hand notation for the full integrals.

This is particularly true once we start doing computations on random variables: eg $\hat\mu=(X_1+\cdots+X_n)/n$ is in turn a random variable representing the $n$-sample mean.

For $n$ variables like above, you'd normally use $\Omega=\mathbb{R}^n$, but even in this case, $\hat\mu$ becomes a function $\Omega\rightarrow\mathbb{R}$, which is exactly the same as the probability space definition of a random variable. So random variables expressed as functions $\Omega\rightarrow\mathbb{R}$ arise naturally.

Einar Rødland
  • 11,378
  • 23
  • 39
  • 3
    Which of the two sides is easier to read? It's easy to introduce a third side - you can write $\Bbb E[\mu]$ where $\mu$ is a distribution. No problem – FShrike Sep 17 '23 at 12:51
  • @Fshrike The problem with this is that you cannot formulate the question of independence of two random variables with distributions $\mu$ and $\nu$ only in terms of $\mu$ and $\nu$. – M W Sep 17 '23 at 19:04
  • @MW why not? $\pi$ is said to be an independent coupling of $\mu,\nu$ if $\pi=\mu\otimes\nu$ – Andrew Sep 18 '23 at 21:43
  • @Andrew right, but that means you have to define $\pi$ as well as $\mu$ and $\nu$. The statement "$\mu$ and $\nu$ are independent" is not meaningful by itself. In contrast, if $X$ and $Y$ are random variables on $\Omega$, then "$X$ and $Y$ are independent" is a well-defined statement. – M W Sep 18 '23 at 21:56