Is a random subset of $\mathbb{R}^2$ connected?

Question

Create a set $S$ by adding each point of $\mathbb{R}^2$ with 50% probability (independently).

What is the probability that $S$ is connected? (and is this even a valid thing to ask?)

What do you mean by adding each point with $1/2$ probability? — rubikscube09, Sep 21 '21 at 17:14
Create a bunch of IIRDV Bernoulli random variables, one for each point. Let S be the set of points where the corresponding random variable is equal to 1. Is this well-defined? — Thomas Browning, Sep 21 '21 at 17:17
I don't think you can really define this with an uncountable number of variables. If you can, though, you should be able to apply some 0-1 law and conclude that the probability is 0. — Trebor, Sep 21 '21 at 17:19
There are enough random choices here that things are difficult to formalize. I think the language of measure, rather than probability, makes this clearer: we have the set $\mathcal{P}(\mathbb{R}^2)$ of all sets in the plane and we want to know the "measure" in some appropriate sense of the subset $C$ consisting of all connected sets. Both $C$ and $\mathcal{P}(\mathbb{R}^2)$ are perfectly nicely defined; the issue is that $\mathcal{P}(\mathbb{R}^2)$ is so big that we don't have an obvious measure on it which is appropriate to the problem. — Noah Schweber, Sep 21 '21 at 17:23
Weird. So apparently you cannot construct uncountably many independent random variables. And I agree that there isn't an obvious measure to work with here. — Thomas Browning, Sep 21 '21 at 17:29
I am pretty sure you absolutely can define the product probability measure on $2^{\mathbb{R}^2}$ as you wanted. See e.g. here. My guess is that the collection of connected subset will not be measurable. — Cronus, Sep 21 '21 at 17:52
If we consider the hyperspace of compact sets $K(\Bbb R^2)$ with the Vietoris topology then the connected sets are a closed nowhere dense subspace, so at least topologically most compact sets are disconnected. This suggests that for reasonable ways to make the question precise the connected sets should have measure 0 — Alessandro Codenotti, Sep 21 '21 at 18:09
@AlessandroCodenotti Interesting. However, my mental picture of a random subset of $\mathbb{R}^2$ is a dense set of points, which is very far from being compact, and seems like it should be connected. — Thomas Browning, Sep 21 '21 at 18:53
I was going to mention what @AlessandroCodenotti mentioned about the topological category of connected sets. As an addition, however, category and measure are generally orthogonal notions (see https://mathoverflow.net/a/43502). So the preference of topological category is somewhat subjective (though it is my preference). — C. Caruvana, Sep 21 '21 at 20:21
@ThomasBrowning I like the "picture a random subset" approach. Unfortunately in my head it's completely unclear whether the random dense subset of $\mathbb R^2$ is connected or totally disconnected — jackson, Sep 22 '21 at 21:19

Mark Saving · Answer 1 · 2021-09-21T18:25:43.760

To really answer this question, we have to go back to the Kolmogorov axioms.

The Kolmogorov axioms state that a probability space is a triple $(A, E, P)$ such that

$A$ is a set
$E$ is a collection of subsets of $A$ - that is, $E \subseteq \mathcal{P}(A)$
$P$ is a function $P : E \to [0, 1]$

such that the following rules hold:

$\sigma$-algebra axioms:

$A \in E$
If $B, C \in E$ then $B \setminus C \in E$
If $B_i$ is in $E$ for all $i \in \mathbb{N}$ then $\bigcup\limits_{i \in \mathbb{N}} B_i \in E$.

Measure axioms:

$P(\emptyset) = 0$
If $B_i$ is in $E$ for all $i \in \mathbb{N}$, and if $B_i \cap B_j = \emptyset$ for all $i \neq j$, then $P(\bigcup\limits_{i \in \mathbb{N}} B_i) = \sum\limits_{i \in \mathbb{N}} P(B_i)$

Probability axiom:

$P(A) = 1$

$A$ is known as "the sample space". $E$ is known as "the set of events", and elements of $E$ are called "events". And $P$ is the probability function.

Let's say we want to discuss the outcome of a Bernoulli experiment with $n$ independent identically distributed indicator variables $X_1, ..., X_n$, each of which takes value $1$ with probability $p$ and value $0$ with probability $1 - p$.

The obvious probability space for this discussion would be $A = \{0, 1\}^n$, the set of all $n$-tuples of either $0$ or $1$. We would take $E = \mathcal{P}(A)$ and define $P(S) = \sum\limits_{s \in S} \prod\limits_{i = 1}^n (1 - p) + (2p - 1) s_i$. The $(1 - p) + (2p - 1) s_i$ term looks a bit weird, but it's just designed to be $p$ when $s_i = 1$ and $1 - p$ when $s_i = 0$.

It's easy to verify that this is in fact a probability distribution, that the random variable $X_i(s) = s_i$ takes value $0$ with probability $1 - p$ and value $1$ with probability $p$, and that the $X_i$ are mutually independent.

What if we want to do infinitely many variables? It turns out that this is still possible. I won't go into the exact details of how it's done, but we can come up with probability space built from the sample space $\{0, 1\}^S$, where $S$ is some possibly infinite set, using something called the "Borel $\sigma$-algebra" as our event space. Basically, we only allow events that can be "built up" from the basic events of $X_i = 0$ and $X_i = 1$ using the processes of countable union and complementation. We can then define the probability measure $P$ using Caratheodory's Criterion and an outer measure. This is all rather technical and would require a good course in measure theory to introduce, but it can be done perfectly well.

So it's perfectly valid to take $|\mathbb{R}^2|$ different random variables and form a probability distribution out of them.

The problem here is that you would need to prove that $\{s \in \{0, 1\}^{\mathbb{R}^2} \mid \{x \in \mathbb{R}^2 \mid s_x = 1\}$ is connected$\}$ is actually part of the $\sigma$-algebra of events. Only events can have their probability taken.

I strongly suspect (but do not yet have a proof) that it will turn out this is not a measurable set. Therefore, we will be unable to ask the question of its probability.

Edit: if we're using the Borel $\sigma$-algebra, then I do in fact have a proof.

Theorem: Let $s$ be a set in the Borel $\sigma$-algebra on $\{0, 1\}^B$ where $B$ is a set. There must be some countable set $V \subseteq B$ such that for all $y \in S$, for all $z \in \{0, 1\}^B$, if for all $v \in V$, $y_v = z_v$, then $z \in S$.

Proof: we proceed by induction on the definition of the Borel $\sigma$-algebra.

Base case 1: $\{0, 1\}^B$. This one is immediate - simply take $V = \emptyset$.

Base case 2: $\{x \in \{0, 1\}^B \mid x_b = q\}$. This one is also immediate: take $V = \{b\}$.

Inductive step 1: Suppose $C, D$ are Borel sets satisfying the property. Pick $V_C$ and $V_D$ respectively. Then $V_C \cup V_D$ is countable and works for $C \setminus D$.

Inductive step 2: Suppose $C_i$ is a Borel set satisfying the property for all $1 \leq i \leq n$. Then for each $i$, take $V_{C_i}$ which works for $C_i$. Then $\bigcup\limits_{i \in \mathbb{N}} V_{C_i}$ is countable and works for $\bigcup\limits_{i \in \mathbb{N}} C_i$.

So the proof is complete. Now consider that there is no such $V$ which works for the set of connected sets.

jackson · Answer 2 · 2021-12-16T21:58:16.710

I'll take a bit of a different approach to Mark's answer, using group invariance on the measure space. I think it's obvious that the probability of producing a connected set should be $0$, so I'll argue that there is at least one way of making $\mathcal{P}(\mathbb R^2)$ a probability measure space such that every point independently has an equal probability of being included. Unfortunately this approach doesn't clarify whether each point's probability is 50%.

Consider the group $\mathop{Sym}(\mathbb{R}^2)$ of self-bijections $\mathbb{R}^2 \to \mathbb R^2$. We can phrase the requirement that each point's inclusion is equally likely and independent of all other points in terms of the action of $\mathop{Sym}(\mathbb{R}^2)$ on $\mathcal{P}(\mathbb{R}^2)$ and $\mathcal{P}(\mathcal{P}(\mathbb{R}^2))$. That is, if $\phi$ is a bijection on $\mathbb{R}^2$, then $\phi$ acts on $\mathcal{P}(\mathbb{R}^2)$ via $\phi(A) = \{\phi(x) \mid x \in A\}$. Furthermore, given any $\mathfrak{A} \subset \mathcal{P}(\mathbb{R}^2)$, we can let $\phi(\mathfrak{A}) = \{\phi(A) \mid A \in \mathfrak{A}\}$.

What does it mean for each point to have equal and independent probability of inclusion? It means that if we have a probability measure space $(\mathcal{P}(\mathbb{R}^2), \Sigma, \mu)$, then for all $\mathfrak{A} \in \Sigma$, and any bijection $\phi \in \mathop{Sym}(\mathbb{R}^2)$, we have $\phi(\mathfrak{A}) \in \Sigma$, and $\mu(\phi(\mathfrak{A})) = \mu(\mathfrak{A})$. So the question is, if $\mathfrak{C} = \{A \subset \mathbb{R}^2 \mid A \text{ is connected}\}$, is there a $\mathop{Sym}(\mathbb R^2)$-invariant probability measure on $\mathcal P (\mathbb R^2)$ such that $\mathfrak{C}$ is measurable? If so, what is $\mu(\mathfrak C)$?

For any cardinality $\kappa < 2^{2^{\aleph_0}}$, the co-$\kappa$ probability measure on $\mathcal{P}(\mathbb{R}^2)$ is $\mathop{Sym}(\mathbb{R}^2)$-invariant (in fact it's $\mathop{Sym}(\mathcal P(\mathbb R^2))$-invariant), so such measures certainly exist.

And there are as many disconnected as connected subsets of $\mathbb{R}^2$. That is, $$ |\mathfrak{C}| = |\mathcal{P}(\mathbb{R}^2) \setminus \mathfrak{C}| = |\mathcal P(\mathbb R)| = 2^{2^{\aleph_0}}. $$ Indeed, let $A \subset \mathbb{R}$. For all such $A$, we can construct a distinct connected $C_A \subset \mathbb{R}^2$ and disconnected $D_A \subset \mathbb R^2$. Given $A \subset \mathbb{R}$, let $A'$ be the subset of $\mathbb R$ where all nonnegative elements are shifted up by $1$: $$ A' = (A \cap (-\infty, 0)) \cup ((A \cap [0, \infty)) + 1). $$ The $A \mapsto A'$ is an injective map $\mathcal P(\mathbb R) \to \mathcal P(\mathbb R)$. Now let $E_A = \{(x, y) \mid x \in A'\}$. Then $C_A := E_A \cup \{(\frac{1}{2}, \frac{1}{2}), (\frac{1}{2}, -\frac{1}{2})\}$ is disconnected, and $D_A := E_A \cup \{(x, y) \mid y=0\}$ is connected.

So $\mathfrak{C}$ is not co-$\kappa$ for any $\kappa < 2^{2^{\aleph_0}}$. In particular, the co-$2^{\aleph_0}$ probability measure is $\mathop{Sym}(\mathbb{R}^2)$-invariant and finds that $\mu(\mathfrak{C}) = 0$*.

Unfortunately, I doubt if if there are any $\mathop{Sym}(\mathbb{R}^2)$-invariant probability measures on $\mathcal{P}(\mathbb{R}^2)$ besides co-$\kappa$ measures. Besides, $\mathop{Sym}(\mathbb R^2)$-invariance is ridiculously strong requirement—for instance, I believe the Lebesgue completion of a $\mathop{Sym}(X)$-invariant measure on $\mathcal P (X)$ always has $\mathcal P(\mathcal P(X))$ as its $\sigma$-algebra. So I am very sympathetic to the answer "$\mathfrak C$ should not be measurable." But there is at least one way it can be measurable, and in that way its measure is $0$.

Edit: I believe I missed a detail about the co-$\kappa$ measure. I was assuming that for $A \subset \mathcal P (\mathbb R^2)$, if $A$ is not co-$\kappa$, then $\mu(A) = 0$. But this is not a measure at all, as any partition of $\mathcal P (\mathbb R^2)$ into $2$ disjoint and equal-cardinality subsets violates the disjoint-sum axiom of a measure. To fix this, we can only consider sets that are measurable in the co-$\kappa$ topology's Borel sigma algebra. Then $\mathfrak C$ is unmeasurable, and $\mu(\mathfrak C)$ is undefined.

I find this answer very interesting, but I have some trouble following the details. What is the co-$\kappa$ probability measure? (I couldn't find anything about it in my measure theory textbooks, and on the internet I only got completely unrelated search results.) And can you clarify why you think that "the point inclusions are independent and equally likely" is equivalent to "$\mu$ is permutation-invariant"? (I think that examples such as this one show that dependent random variables can also be permutation-invariant?) — Josse van Dobben de Bruyn, Dec 16 '21 at 19:55
I'm not certain if the phrase "co-$\kappa$" is standard but it's modeled off of "cofinite," as in cofinite topology and cofinite measure. The specific measure I'm talking about is one that finds $\mu(A) = 1$ if $A$ is co-$\kappa$ in $X$, meaning $|X \setminus A| \le \kappa$, $\mu(A) = 0$ if $|A| \le \kappa$, and $A$ is unmeasurable otherwise. Looking at the definition here (I would call this cocountable rather than cofinite) https://math.stackexchange.com/questions/1464801/cofinite-sigma-algebra-not-a-measure-space it seems possible I'm using it wrong and my conclusion doesn't hold at all... — jackson, Dec 16 '21 at 21:27
Thank you for clarifying! It seems that the point probabilities $\mathbb{P}[p \in S]$ are not even defined (let alone independent) in your model, so this does not strictly answer the literal question. Nevertheless, your broader symmetry-invariant approach is an interesting addition, and provides another example of a “reasonable” probability measure on $\mathcal{P}(\mathbb{R}^2)$, so it does (partially) answer the question in spirit. — Josse van Dobben de Bruyn, Dec 19 '21 at 20:26

Josse van Dobben de Bruyn · Accepted Answer · 2021-12-23T21:12:02.570

Short answer. The probability is undefined, because the probability space is underspecified. In fact, there are models for your problem where the event $\{S\ \text{is connected}\}$ is measurable with arbitrary probability.

Long answer (proof). Consider the following equivalent formulation of your problem:

Question 1. Let $(\mathcal P(\mathbb{R}^2),\mathscr{A},\mathbb{P})$ be a probability space on $\mathcal P(\mathbb{R}^2)$. Suppose that for each $p \in \mathbb{R}^2$ the event $\{p \in S\}$ is measurable with $\mathbb{P}[p \in S] = \tfrac{1}{2}$, and suppose that the family of random variables $\{\{p \in S\} \, \mid \, p\in\mathbb{R}^2\}$ is independent. Does this determine the probability of the event $\{S\ \text{is connected}\}$?

A probability space meeting these requirements is guaranteed to exist. Indeed, for each $p \in \mathbb{R}^2$, let $(\Omega_p,\mathscr{A}_p,\mu_p)$ be the probability space with ambient space $\Omega_p = \{0,1\}$, $\sigma$-algebra $\mathscr{A}_p = \mathcal P(\{0,1\})$, and probability measure $\mu(0) = \mu(1) = \tfrac{1}{2}$ (a Bernoulli random variable with probability $\frac{1}{2}$), and let $(\Omega,\mathscr{A},\mu) = \prod_{p\in\mathbb{R}^2} (\Omega_p,\mathscr{A}_p,\mu_p)$ be the product of these probability spaces (see this answer). Then $\Omega$ can be identified with $\mathcal P(\mathbb{R}^2)$, and the point events $\{\{p \in S\} \, \mid \, p\in\mathbb{R}^2\}$ are i.i.d. $\text{Bernoulli}(\frac{1}{2})$ random variables, so $(\Omega,\mathscr{A},\mu)$ meets the requirements of Question 1.

The issue is that the set $\{S \ \text{is connected}\}$ is not measurable in this measure space (I think this is also what Mark Saving's answer tried to show). I will go one step further and show that it is not measurable in the completion $(\Omega,\mathscr{A}^\mu,\mu)$ of $(\Omega,\mathscr{A},\mu)$, and that for every $\xi \in [0,1]$ there exists an extension of $(\Omega,\mathscr{A}^\mu,\mu)$ in which the event $\{S\ \text{is connected}\}$ is measurable with probability $\xi$.

Lemma 2. Let $(\Omega,\mathscr{A},\mu) = \prod_{p\in\mathbb{R}^2} (\Omega_p,\mathscr{A}_p,\mu_p)$ be the infinite product defined above, and let $A \in \mathscr{A}$. Then there is a countable set $M \subseteq \mathbb{R}^2$ such that for every $U \in A$ and every $V \subseteq \mathbb{R}^2$ with $U \cap M = V \cap M$ one has $V \in A$.

Proof sketch. Every event in an uncountable product $\prod_{i\in I} (\Omega_i,\mathscr{A}_i,\mu_i)$ of probability spaces belongs to some countable sub-product; see e.g. Lemma 3.5.2 in [Bog07].

Lemma 3. Let $M \subseteq \mathbb{R}^2$ be countable with $|M| \geq 2$. Then $M$ is disconnected.

Proof. Assume first that the points of $M$ do not all have the same $x$-coordinate. Write $\alpha = \inf\{x \, \mid \, (x,y) \in M\} \in [-\infty,+\infty)$ and $\omega = \sup\{x \, \mid \, (x,y) \in M\} \in (-\infty,+\infty]$. Then $\alpha < \omega$, and for every $\mu \in (\alpha,\omega)$ the set $M$ contains points on either side of the vertical line $x = \mu$. Since $M$ is countable and $(\alpha,\omega)$ is uncountable, we may choose some $\mu \in (\alpha,\omega)$ such that $x \neq \mu$ for all $(x,y) \in M$. But then $M = (M \cap \{x < \mu\}) \cup (M \cap \{x > \mu\})$, so $M$ is disconnected.

Assume now that the points of $M$ all have the same $x$-coordinate. Then, since $|M| \geq 2$, the points of $M$ do not all have the same $y$-coordinate, so an analogous argument shows that $M$ is disconnected. $\quad\Box$

Lemma 4. Let $M \subseteq \mathbb{R}^2$ be countable. Then $\mathbb{R}^2 \setminus M$ is connected.

Proof. It is sufficient to prove that $\mathbb{R}^2 \setminus M$ is path connected, since every path connected space is connected. Let $x,y \in \mathbb{R}^2 \setminus M$ be distinct. Since $M$ is countable, there are uncountably many lines through $x$ (resp. $y$) which contain no points from $M$. Choose a line $\ell_x$ through $x$ and a line $\ell_y$ through $y$ such that $\ell_x$ and $\ell_y$ are not parallel and $\ell_x \cap M = \ell_y \cap M = \varnothing$. Then $\ell_x$ and $\ell_y$ intersect, so we can form a path in $\mathbb{R}^2 \setminus M$ from $x$ to $y$ via $\ell_x$ and $\ell_y$. $\quad\Box$

Proposition 5. Let $(\Omega,\mathscr{A},\mu) = \prod_{p\in\mathbb{R}^2} (\Omega_p,\mathscr{A}_p,\mu_p)$ be the infinite product defined above, and let $A \in \mathscr{A}$ be non-empty. Then $A$ contains both a connected and a disconnected subset.

Proof. By Lemma 2, we may choose a countable set $M \subseteq \mathbb{R}^2$ such that for every $U \in A$ and every $V \subseteq \mathbb{R}^2$ with $U \cap M = V \cap M$ we have $V \in A$. Since $A$ is non-empty, we may choose some $U_0 \in A$. Choose two distinct points $x_1,x_2 \in \mathbb{R}^2 \setminus M$, and define $V_0,V_1 \subseteq \mathbb{R}^2$ by $V_0 = \{x_1,x_2\} \cup (U_0 \cap M)$ and $V_1 = U_0 \cup (\mathbb{R}^2 \setminus M)$. Then $V_0 \cap M = V_1 \cap M = U_0 \cap M$, so we have $V_0,V_1 \in A$. Furthermore, $V_0$ is disconnected (by Lemma 3) and $V_1$ is connected (by Lemma 4). $\quad\Box$

Corollary 6. Let $(\Omega,\mathscr{A},\mu) = \prod_{p\in\mathbb{R}^2} (\Omega_p,\mathscr{A}_p,\mu_p)$ be the infinite product defined above, and let $A,B \in \mathscr{A}$ such that $$ A \subseteq \{S \ \text{is connected}\} \subseteq B. $$ Then $A = \varnothing$ and $B = \Omega$.

Proof. If $A \in \mathscr{A}$ is non-empty, then $A$ contains a disconnected subset, so $A \not\subseteq \{S \ \text{is connected}\}$. If $B \in \mathscr{A}$ with $B \neq \mathbb{R}^2$, then $\mathbb{R}^2 \setminus B$ contains a connected subset, so $\{S \ \text{is connected}\} \not\subseteq B$. $\quad\Box$

Let $\mu_*,\mu^* : \mathcal P(\Omega) \to [0,1]$ denote the inner and outer measures associated with $\mu$; that is: \begin{align*} \mu_*(B) &= \sup\{\mu(A) \, \mid \, A \in \mathscr{A}, \ A \subseteq B\};\\[1ex] \mu^*(B) &= \inf\{\mu(A) \, \mid \, A \in \mathscr{A}, \ B \subseteq A\}. \end{align*} It follows from Corollary 6 that $\mu_*(\{S\ \text{is connected}\}) = 0$ and $\mu^*(\{S\ \text{is connected}\}) = 1$. Therefore:

the set $\{S\ \text{is connected}\}$ is not measurable in the infinite product $(\Omega,\mathscr{A},\mu)$ or in its completion $(\Omega,\mathscr{A}^\mu,\mu)$;
by this answer, for every $\xi \in [0,1]$ there exists an extension $(\Omega,\mathscr{A}',\mu')$ of $(\Omega,\mathscr{A}^\mu,\mu)$ in which $\{S\ \text{is connected}\}$ is measurable and has probability $\xi$.

This shows that the problem is underspecified. In other words, we need to specify more than just the individual point probabilities if we want the probability of $\{S \ \text{is connected}\}$ to be uniquely defined. I suggest you try to find another model for your problem, for instance by requiring some kind of invariance (as in Jackson's answer) instead of individual point probabilities. In general, it seems that uncountable products of probability spaces only allow us to say things about events which depend only on a countable amount of data.

Closing remarks:

I had initially suspected that the probability would be $0$ or $1$, due to Kolmogorov's zero-one law (thanks to Trebor's comment to the original question). Indeed, the probability does not depend on any finite portion of the data, so it looks like a tail event. Therefore I actually expected to find that either $\{S\ \text{is connected}\}$ or its complement would be a null set (that is, contained in a measurable set with probability $0$), even if it would not be measurable in its own right. My answer shows that this is not the case. I guess this shows that zero-one laws only apply to tail events, and not to tail non-events.
Note that my answer remains valid (and reaches the same conclusion) if we change the individual point probabilities. We can give each point a different probability of being included (independently of the other points), and these probabilities can be anything. Even if we specify that all points occur with probability $0$, the probability of $S$ being connected can still be arbitrary! The problem remains underspecified, and my solution still constructs a measure space where every point occurs with probability $0$ but the event $\{S\ \text{is connected}\}$ has arbitrary probability. (Very strange!) The individual point probabilities simply do not say anything about events which depend on an uncountable amount of data.
Likewise, every part of my solution is still valid if we replace “connected” by “path connected”.
There used to be a remark here about my intuitive beliefs of what the answer should be, but I removed because I think it was missing the point. The key takeaway is that an uncountable product of probability spaces is not a good model for this type of problem.
The key takeaway from the preceding remark makes it seem all to more incredible to me that continuous time stochastic processes, such as the Wiener process (i.e. Brownian motion), are well-defined. Maybe the problem can be modelled as a stochastic process somehow? We should probably look into continuum percolation.

References.

[Bog07] V.I. Bogachev, Measure Theory, Volume I, Springer, 2007.

Nice answer! I agree with your intuitive beliefs regarding the probability of connected and path connected, although I have trouble convincing myself whether an uncountable closed set disjoint from S should be likely or unlikely. — Thomas Browning, Dec 18 '21 at 02:48
@ThomasBrowning ah yes, I see now that you were the one who put the picture of a dense point set in my mind (in the comments below the question). In any case, I ended up removing my remarks about the intuitive interpretation, as they seemed to undermine the point I was trying to make. I added some other comments: (a) strangely, the probability is still undefined if we take all points independently with probability $0$ (?!); (b) we should maybe look into continuum percolation theory. — Josse van Dobben de Bruyn, Dec 19 '21 at 20:50

Is a random subset of $\mathbb{R}^2$ connected?

3 Answers3