14

For a probability space $(\Omega, \mathcal F, P)$, I'm trying to construct a discrete random variable $X$ (one for which $P(X\in K) = 1$ for some countable set $K$) for which

  1. $\text{im}(X)$ is uncountable.
  2. $\text{im}(X_*P)$ is uncountable, where $X_*P$ is the pushforward measure of $X$ w.r.t. $P$.
  3. $\text{im}(F_X)$ is uncountable where $F_X$ is the CDF of $X$.

Is the below construction correct?

Let our probability triple be $(\Omega, \mathcal F, P) = ([0,1],\mathcal{B},P)$ where $\mathcal{B}$ is the Borel $\sigma$-algebra on $[0,1]$ and $P$ is given by

$$P:S \in \mathcal{B} \mapsto \sum_{q_i\in S} 2^{-i}.$$

for some enumeration $\{q_i\}$ of the rationals in $[0,1]$. Letting $X$ be the identity function, we verify that it is discrete: for $K=\mathbb{Q}\cap [0,1]$ we have $$ P(X\in K) = P(K) = \sum_{i=1}^\infty 2^{-i} = 1.$$ Finally, we have that

  1. $\text{im}(X) = [0,1]$.
  2. $\text{im}(X_*P) = [0,1]$ because any $x\in[0,1]$ can be written in binary as $$x = \sum_{i=1}^\infty c_i2^{-i}$$ for $c_i \in \{0,1\}$. Therefore $$X_*P\bigg(\{q_i : c_i = 1\}\bigg) = x.$$
  3. $\text{im}(F_X)$ is uncountable, as for any distinct $x,y\in[0,1]$ there is a rational between them, so that, assuming WLOG that $x<y$, we have $$F_X(x) < F_X(y)$$ and $\text{im}(F_X)$ has as many values as numbers in $[0,1]$.

Motivation:

There are multiple definitions of a discrete random variable floating around. I have seen 4.

  1. Definition 1 - This post explains that a discrete random variable is defined as one for which there is a countable set $K$ such that $P(X=K) = 1$.

  2. Definition 2 - Wikipedia, citing Yates' The Practice of Statistics, offers a different definition:

When the image (or range) of $X$ is finitely or infinitely countable, the random variable is called a discrete random variable.

Elsewhere I've seen the suggestion that a discrete random variable is one

  1. Definition 3 - for which $\text{im}(X_*P)$ is countable, or

  2. Definition 4 - for which $\text{im}(F_X)$ is countable.

The current post is merely me slapping myself into understanding that the Definition 1 is not equivalent to Definitions 2, 3 or 4.

BCLC
  • 14,197
Sam
  • 5,208
  • 1
    This is a great observation. It follows that your $Image(F_X)$ set (equivalently, the set $F_X(\mathbb{R})$) has a "Cantor-like" property: It is an uncountably infinite subset of $[0,1]$ that has measure zero! It works for any positive probabilities $p_i$ that sum to 1 (with $p_i=P[X=q_i]$). The disjoint open intervals not in $F_X(\mathbb{R})$ are $(F_X(q_i)-p_i, F_X(q_i))$ for $i \in {1, 2, 3, ...}$, which have length $p_i$ and hence sum to 1. – Michael Aug 10 '24 at 14:24
  • 1
    A related post on similar functions (strictly increasing with discontinuity only on a countable set), also including the one in the OP. Another related post on distributions on rational numbers in $[0,1]$. – Amir Aug 10 '24 at 18:49
  • 1
    @BCLC The term almost surely countable is not used for deterministic sets. It can be used for random sets in slightly different way; for example, Almost surely, the set of times that the Brownian Motion attains a local maximum is countable. I think you want to describe the image of a discrete random variable (which is always the union of a countable set and a null set as discussed in my answer) as almost surely countable set, which is not technically correct. I am not sure if there is a name for such sets. – Amir Aug 14 '24 at 08:22

3 Answers3

5
  1. $\text{im}(X)$ is uncountable.

True. While the Wikipedia definition, $X$ having a countable range, does imply that it's discrete, you've correctly shown that it's not a necessary condition. That said, the distinction between the two definitions doesn't matter for the most part. If $K\subseteq\mathbb R$ is a countable subset with probability $1$, then the composition $X^{-1}(K)\hookrightarrow\Omega\to\mathbb R$ has countable range and induces the same measure as the map $\Omega\to\mathbb R$. Since we generally only care about the induced measure and not the underlying probability space, the two are equivalent in that sense.

  1. $\text{im}(X_*P)$ is uncountable, where $X_*P$ is the pushforward measure of $X$ w.r.t. $P$.

True. Requiring $\operatorname{im}(X_*P)$ to be countable is too restrictive and actually is equivalent to $X$ having finite support.

  1. $\text{im}(F_X)$ is uncountable where $F_X$ is the CDF of $X$.

True. I believe that requiring $\operatorname{im}(F_X)$ to be countable is equivalent to having a set $K$ with $P(X\in K)=1$ and the closure of $K$ countable, i.e. the support of $X$ is countable. This is a strictly stronger statement than the standard definition.

Carmeister
  • 4,128
  • 'Since we generally only care about the induced measure and not the underlying probability space, the two are equivalent in that sense.' --> so it's like 'almost surely countable' ? – BCLC Aug 12 '24 at 11:12
  • 1
    +1 Great answer. Could you elaborate on what "the compostion $X^{-1}(K) \to \Omega\to \mathbb{R}$" is i.e. what functions exactly are at play here? – Sam Aug 12 '24 at 19:17
5

For a discrete random variable $X$, if the sample space is not finite, then

  1. $\text{im}(X)$ can be uncountable.
  2. $\text{im}(X_*P)$ can be uncountable.
  3. $\text{im}(F_X)$ can be uncountable.

If the sample space is finite, all of the three images are finite.

Hence, we cannot define a discrete random variable based on the countability properties of the images $\text{im}(X)$, $\text{im}(X_*P)$, and $\text{im}(F_X)$. Instead, a discrete random variable can be rigorously defined in two equivalent ways as follows (see here for a detailed comparison with the definitions of continuous and mixed random variables).

First definition:

A random variable $X$ is discrete iff $\sum_{x \in S_0}\mathbb P(X=x)=1$ for a countable set $S_0$ of points, where $\mathbb P(X=x)>0$ for $x\in S_0$ and $\mathbb P(X=x)=0$ for $x\notin S_0$ (equivalently, for some countable set $K$, $\mathbb P(X\in K)=1$).

The set $S_0=\{ x \in \mathbb R: \mathbb P(X=x)>0 \}$ is a countable subset of the image of $X$, denoted by $\text{im}(X)$. The set $S_0$ is nothing but the support of $X$ (see this answer [2] for two equivalent definitions of the support of a random variable; the definition given in Wikipedia [2] is not precise for all types of random variables). In fact, the image of a discrete random variable is the union of a countable set (its support $S_0$) and a null set, and thus the image $\text{im}(X)$ can be countable or uncountable (note that $K$ is not unique and can be any superset of $S_0$, which is the smallest one).

Second definition:

A random variable $X$ is discrete iff its cdf $F_X(x)=\mathbb P(X \le x)$ is discontinuous only on a countable set of points in $\mathbb R$ whose jump values sum to $1$ (equivalently, the image of the cdf is a null set).

Indeed, the cdf has jumps only at the points in $S_0$, and is continuous on $ \mathbb R \setminus S_0$ where $S_0$ is the set appeared in the first definition. The equivalent condition given in the parentheses follows from the fact that any increasing function is discontinuous at most on a countable set, and thus $F_X$ has a null image iff its jump values sum to $1$.

Amir
  • 11,124
  • 1 - But $S_0$ is the smallest possible $K$? 2 - What's the difference between $S_0$ and ${q \in \mathbb R | P({X = q}) > 0 }$? – BCLC Aug 12 '24 at 13:50
  • 1
    @BCLC Yes exactly! 1- $S_0$ is the smallest possible $K$. 2- They are the same. – Amir Aug 12 '24 at 14:30
  • Amir should your paragraph before 'second definition' be instead something like the following? The set $S_0 = {q \in \mathbb R | P({X = q}) > 0 }$ is a countable subset of the image $\text{im}(X)$ of $X$ called the support of the discrete random variable $X$. The image of discrete random variable $X$ is the union of its countable support and a null set, which can be countable or uncountable. Note that $K$ is not unique and can be any countable superset of $S_0$. – BCLC Aug 27 '24 at 16:54
  • "For a discrete random variable X, if the sample space is not finite, then..." - But what if the sample space is countable? Related to that, I don't think your definition of $\mathrm{im}(X)$ is the one that is widely accepted. – paperskilltrees Jun 28 '25 at 16:40
5

This is a great observation. Just a few thoughts:

  1. Your set $Image(F_X)$, which we can also call $F_X(\mathbb{R})$, has a "Cantor-set-like" property because it is an uncountably infinite subset of $[0,1]$ that has measure 0.

  2. As a formality, your proof of part 3 shows that your function $F_X:[0,1]\rightarrow[0,1]$, which can also be viewed as a function $h:[0,1]\rightarrow F_X(\mathbb{R})$, is strictly increasing and hence injective. So $|[0,1]|\leq|F_X(\mathbb{R})|\leq |[0,1]|$, meaning that $|[0,1]|=|F_X(\mathbb{R})|$ by the Cantor-Schroeder-Bernstein theorem.

  3. A simple example of a random variable $X:\Omega\rightarrow \mathbb{R}$ that has $X(\Omega)=\mathbb{R}$ but $P[X=0]=1$ is the identity random variable $X(\omega)=\omega$ for all $\omega \in \mathbb{R}$ for the probability space $(\mathbb{R}, Pow(\mathbb{R}), P)$ with $$P[A] = \left\{\begin{array}{cc} 1 & \mbox{if $0 \in A$} \\ 0 & \mbox{else} \end{array}\right.\quad \forall A \subseteq \mathbb{R}$$ Some people would call $X$ a discrete random variable since it has the same distribution as a random variable that can take at most countably many values (in fact it has the same distribution as the always-zero random variable); others would say it is not a discrete random variable because it can take uncountably many values. As the Carmeister answer points out, these distinctions are minor.


Details on point 1 (the "Cantor-like" property):

This works for any sequence $\{p_i\}_{i=1}^{\infty}$ of positive numbers that sum to 1. Let $\{q_i\}_{i=1}^{\infty}$ list the rationals in $[0,1]$. Let $X:\Omega\rightarrow \cup_{i=1}^{\infty} \{q_i\}$ be a random variable with $P[X_i=q_i]=p_i$ for all $i$. Define $F_X(\mathbb{R}) = \{F_X(x) : x \in \mathbb{R}\}$. The OP already shows that $F_X(\mathbb{R})$ is an uncountably infinite subset of $[0,1]$. We show it has measure 0.

For each $i \in \{1, 2, 3, ...\}$ define the interval $$V_i=[F_X(q_i)-p_i, F_X(q_i))$$ Let $\lambda(\cdot)$ denote the Lebesgue measure. It is clear that $V_i\subseteq [0,1]$ and $\lambda(V_i)=p_i$ for all $i \in \{1, 2, 3, ...\}$.

Observe that:

i. $V_i\cap F_X(\mathbb{R}) = \phi$ for all $i \in \{1, 2, 3, ...\}$.

ii. The intervals $V_i$ are disjoint for $i \in \{1, 2, 3, ...\}$

iii. $\lambda(\cup_{i=1}^{\infty} V_i)=\sum_{i=1}^{\infty} \lambda(V_i)=1$.

iv. $\lambda([0,1]\setminus \cup_{i=1}^{\infty} V_i)=0$.

Therefore, $F_X(\mathbb{R})$ is a subset of $[0,1]\setminus \cup_{i=1}^{\infty} V_i$. Since $\cup_{i=1}^{\infty}V_i$ is Borel measurable, the set $F_X(\mathbb{R})$ is a subset of a Borel set of measure 0, so $F_X(\mathbb{R})$ is Lebesgue measurable and $\lambda(F_X(\mathbb{R}))=0$.


Proof of (i): Suppose not. Then there is a $t \in [0,1]$ and $i \in \{1, 2, 3, ...\}$ such that $F_X(t)\in V_i$, so $$ F_X(q_i)-p_i \leq F_X(t)< F_X(q_i) \quad (Eq. *) $$ Since $F_X(x)$ is strictly increasing over $x\in[0,1]$ we know $t<q_i$. Then there is a rational $w$ that satisfies $t<w<q_i$ and so (since $F_X(t)<F_X(w)$): $$ F_X(t) + p_i < F_X(w)+p_i = F_X(w)+P[X=q_i]\leq F_X(q_i)$$ and so $F_X(t)< F_X(q_i)-p_i$, which contradicts (Eq. *). $\Box$

Proof of (ii): The $V_i$ intervals are disjoint because if $0\leq q_i<q_j\leq 1$, there is a rational $w$ such that $q_i<w<q_j$ and so $$ F_X(q_i) + P[X=q_j] < F_X(w) + P[X=q_j]\leq F_X(q_j)$$ where the strict inequality holds because we already know $F_X(x)$ is strictly increasing over $x \in [0,1]$. Thus $$ F_X(q_i) < F_X(q_j)-p_j$$ so the right endpoint of $V_i$ ends before the left endpoint of $V_j$. $\Box$

Proof of (iii): This immediately follows by the fact that $V_i$ are disjoint and $\lambda(V_i)=p_i$ for all $i$. $\Box$

Proof of (iv): We have $$ [0,1] = \left(\cup_{i=1}^{\infty} V_i\right)\cup \left([0,1]\setminus \cup_{i=1}^{\infty} V_i\right)$$ so $$\underbrace{\lambda([0,1])}_1 = \underbrace{\lambda(\cup_{i=1}^{\infty}V_i)}_{1}+ \lambda\left([0,1]\setminus \cup_{i=1}^{\infty} V_i \right) $$ $\Box$


While we know $F_X(\mathbb{R})$ is a Lebesgue measurable subset of $[0,1]$, with some more it can be shown that $F_X(\mathbb{R})$ is Borel measurable because of the following claim:

Claim: We have $F_X(\mathbb{R})=[0,1]\setminus \cup_{i=1}^{\infty} V_i$. In other words, the set $F_X(\mathbb{R})$ consists precisely of those points in $[0,1]$ that are not in $\cup_{i=1}^{\infty} V_i$.

Proof of Claim: We already know $F_X(\mathbb{R}) \subseteq [0,1]\setminus \cup_{i=1}^{\infty} V_i$. It suffices to prove $$F_X(\mathbb{R})\supseteq [0,1]\setminus \cup_{i=1}^{\infty} V_i$$ Fix $y \in [0,1]\setminus \cup_{i=1}^{\infty} V_i$. We want to show $y \in F_X(\mathbb{R})$. If $y=1$ we are done because $F_X(1)=1$. Note that $y$ cannot be $0$ because there is some integer $j$ with $0=q_j$ so $0\in V_{j}$. So WLOG we assume $0< y<1$. Define $$z=\inf\{F_X(x) : F_X(x)>y, x\in [0,1]\}$$ This is an infimum over a nonempty set because $F_X(1)=1>y$. It is clear that $$y\leq z \leq 1 \quad (Eq. 2)$$ By definition of $z$ and the fact that $F_X$ is strictly increasing over $x\in[0,1]$, there is a nonincreasing sequence of real numbers $x_k \in [0,1]$ that satisfy $F_X(x_k)>y$ for all $k$, and $$ \lim_{k\rightarrow\infty} F_X(x_k)=z$$ Define $$ t= \lim_{k\rightarrow\infty} x_k$$ where the limit exists because $\{x_k\}_{k=1}^{\infty}$ is nonincreasing in $k$. Then $0\leq t\leq 1$. By the right-continuity property of $F_X$ we know $$F_X(t) = \lim_{k\rightarrow\infty} F_X(x_k) = z$$ Thus $$F_X(t)= z \geq y$$ where we have used (Eq. 2). If $z=y$ then $F_X(t)=y$, so $y \in F_X(\mathbb{R})$ and we are done.

Consider the remaining case $z>y$ (we reach a contradiction). Since $F_X(t)=z$ and there is a gap between $y$ and $z$, the definition of $z$ means there are no real numbers $x<t$ that satisfy $F_X(x) \in (y,z)$. So $F_X$ has a jump discontinuity at $t$. Thus, $t$ is rational, that is $t=q_m$ for some positive integer $m$. We reach a contradiction by showing $y \in V_m$. That is, we want to show $$ F_X(q_m)-p_m\leq y < F_X(q_m) \quad (Eq. Want)$$ We have $$y< z=F_X(t)=F_X(q_m)$$ so the second inequality of (Eq. Want) is satisfied.

It remains to show $y\geq F_X(q_m)-p_m$. Suppose not (we reach a contradiction). Then $y<F_X(q_m)-p_m = \lim_{\delta\searrow 0} F_X(q_m-\delta)$. Choose a number $v \in [0, q_m)$ such that $$ y < F_X(v) \leq F_X(q_m)-p_m $$ Then $F_X(v) \in \{F_X(x): F_X(x)>y, x \in [0,1]\}$ and the definition of $z$ means $z \leq F_X(v)$. On the other hand we know $$F_X(v) \leq F_X(q_m)-p_m<F_X(q_m)=z$$ which contradicts $z\leq F_X(v)$. $\Box$

Michael
  • 26,378
  • Yes, I believe I can prove $[0,1]\setminus F_X(\mathbb{R}) = \cup_{i=1}^{\infty} [F_X(q_i)-p_i, F_X(q_i))$, so indeed $F_X(\mathbb{R})$ is Borel, but the proof is a bit longer than I want to type up. – Michael Aug 10 '24 at 18:16
  • 1
    +1: Nice answer! I hope you can complete the proof that the image of the cdf is Borel when you can spare some time. I think this happens only if the subset of those elements $x$ in the image of $X$ with $P(X=x)>0$ is a densely ordered countable subset of real numbers . It may be interesting that if instead of $X(\omega)=\omega$ used in the OP, we define $Y(\omega)=\omega$ for any $\omega \not \in \mathbb Q$ and $Y(\omega)=i$ for $\omega=q_i \in \mathbb Q$, the image of $F_Y$ is a countable set (the image of $Y$ can be ordered w.r.t usual order). – Amir Aug 10 '24 at 21:12
  • @Amir : Thanks. I have posted a proof that $F_X(\mathbb{R})$ is Borel in my new answer. Since it is somewhat lengthy, I have separated it as a separate answer from this one. – Michael Aug 11 '24 at 23:40
  • re 'it is an uncountably infinite subset of [0,1] that has measure 0.' - so it's like 'almost surely uncountable' ? – BCLC Aug 12 '24 at 13:41
  • @BCLC : The set $F_X(\mathbb{R})$ is not a random set. It is (surely) uncountable. When I say it has measure zero, I mean its Lebesgue measure within the unit interval $[0,1]$ is zero, that is $\lambda(F_X(\mathbb{R}))=0$. So "almost all points in $[0,1]$ are in the complement of $F_X(\mathbb{R})$." – Michael Aug 12 '24 at 15:09
  • 1
    The generalized question is here: https://math.stackexchange.com/questions/4957672/image-of-an-increasing-function-with-discontinuities-on-a-dense-countable-set-s/4957723?noredirect=1#comment10608613_4957723 – Michael Aug 13 '24 at 16:58
  • I combined my other answer into this one to show $F_X(\mathbb{R})=[0,1]\setminus \cup_{i=1}^{\infty}V_i$. – Michael Aug 17 '24 at 19:30