3

Suppose I have a contiguous set of integers $\{1\ldots n\}$, from which I sample each element uniformly with replacement to produce two finite multisets $S$ and $T$, with sizes $|S|$ and $|T|$ fixed in advance. What is the probability that $S$ and $T$ are disjoint?

Proof

Let $s_i$ be the event that $i\in S$, similarly for $t_i$, then we have,

\begin{align} p(S\cap T=\emptyset)&=1-p((s_1\land t_1)\lor\ldots\lor(s_n\land t_n))\\[6pt] &=1-\sum_ip(s_i\land t_i)+\sum_{i< j}p((s_i\land t_i)\land(s_j\land t_j))-\ldots(-1)^np((s_i\land t_i)\land\ldots \land(s_n\land t_n))\\ &=1-\sum_ip(s_i\land t_i)+\sum_{i< j}p(\color{blue}{(s_i\land s_j)}\land\color{blue}{(t_i\land t_j)})-\ldots(-1)^n p(\color{blue}{(s_1\land\ldots\land s_n)}\land\color{blue}{(t_1\land\ldots\land t_n)})\\ &=1-\sum_ip(s_i)p(t_i)+\sum_{i< j}p(s_i\land s_j)p(t_i\land t_j)-\ldots(-1)^n p(s_1\land\ldots\land s_n)p(t_1\land\ldots\land t_n)\\ \end{align}

where the second equality follows by the Addition Law, the third by the associativity of conjunction, and the final equality from the fact that $s_i$ and $t_j$ are independent events.

Now, since the $s_i$ are exchangeable, and similarly for $t_i$, we can write this as,

$$1-np_{1_s}p_{1_t}+\binom{n}{2}p_{2_s}p_{2_t}-\ldots(-1)^np_{n_s}p_{n_t}=\sum_{k=0}^n(-1)^k\binom{n}{k}p_{k_s}p_{k_t}$$

where $p_{k_s}=p(s_1\land\ldots\land s_k)$.

Thus we have,

\begin{align} p_{k_s}=p(s_1\land\ldots\land s_k)&=1-p(\neg s_1\lor\ldots\lor\neg s_k)\\[6pt] &=1-\sum_ip(\neg s_i)+\sum_{i<j}p(\neg s_i\land\neg s_j)-\ldots(-1)^kp(\neg s_1\land\ldots\land\neg s_k)\\ &=1-k\Big(\frac{n-1}{n}\Big)^{|S|}+\binom{k}{2}\Big(\frac{n-2}{n}\Big)^{|S|}-\ldots(-1)^k\Big(\frac{n-k}{n}\Big)^{|S|}\\[3pt] &=\sum_{l=0}^k(-1)^l\binom{k}{l}\Big(\frac{n-l}{n}\Big)^{|S|} \end{align}

Folding one expression into the other, we obtain,

$$p(S\cap T=\emptyset)=\sum_{k=0}^n(-1)^k\binom{n}{k}\Big(\sum_{l=0}^k(-1)^l\binom{k}{l}\Big(\frac{n-l}{n}\Big)^{|S|}\Big)\Big(\sum_{l=0}^k(-1)^l\binom{k}{l}\Big(\frac{n-l}{n}\Big)^{|T|}\Big)$$

Question

Is this correct? and, if so, is there a simpler derivation?

Set
  • 8,251
  • Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on [meta], or in [chat]. Comments continuing discussion may be removed. – Shaun Jan 11 '25 at 09:56
  • Earlier you wrote that $S$ and $T$ are multisets. If indeed $S$ is uniformly distributed among multisets of cardinality $s$, and likewise for $T$, then with this sampling method, the probability $\mathbb P(S\cap T = \emptyset)$ is different from what you and ploosu2 have computed. More precisely, $$\mathbb P(S\cap T = \emptyset) = \sum_{i=1}^s \sum_{j=1}^t \frac{\binom{s-1}{i-1}\binom{t-1}{j-1} }{\binom{s+n-1}{s} \binom{t+n-1}{t}}\binom{n}{i} \binom{n-i}{j}.$$ In the special case $(n,s,t)=(7,5,4)$, the probability is $0.168182$, instead of $\approx 0.061$ with the other sampling method. – Gabriel Romon Jan 11 '25 at 12:12
  • @GabrielRomon Your formula does not agree empirically when tested on $(7,5,4)$, you may have made an error somewhere in your calculation. – Set Jan 11 '25 at 20:31
  • I did simulations with uniform distributions over multisets, and confirmed that my formula is correct. Understand this : if you do sampling with replacement, the distribution induced on all multisets is not uniform. This is obvious: all the $5!$ permutations of the list $(1,2,3,4,5)$ result in the same multiset, while the multiset ${1,1,1,1,1}$ can only be obtained with the list $(1,1,1,1,1)$. – Gabriel Romon Jan 11 '25 at 20:41
  • @GabrielRomon ah yes I see, so your solution is for a slightly different question than the one I was trying to ask. For my question, the uniformity corresponds to the sampling of each element from the set. Clearly I need to further clarify the wording in my question. – Set Jan 11 '25 at 20:46

1 Answers1

1

Condition on the number of distinct elements that appear in $S$ and use that number of surjections from $[s]$ to $[k]$ is given by $\sum_{j=0}^k (-1)^j \binom{k}{j} (k-j)^s$ to get

$$ p := \mathbb P (S\cap T = \emptyset) = \sum_{k=0}^n \mathbb P( S \text{ has k distinct letters}) \mathbb P(S\cap T = \emptyset | S \text{ has k distinct letters}) \\ = \sum_{k=0}^n \binom{n}{k} \mathbb P( S \text{ has exactly the letters 1,2,...,k}) \mathbb P(T \text{ has only letters }> k) \\ = \frac{1}{n^{s+t}}\sum_{k=0}^n \binom{n}{k} \sum_{j=0}^k (-1)^j \binom{k}{j} (k-j)^s (n-k)^t $$

I am yet to see how your formula simplifies to this, but they appear to agree.

ploosu2
  • 12,367
  • Very slick. Indeed, because the number of surjections is zero when $k > s$, your formula can be simplified slightly further, by replacing the upper bound on the outer sum with $\min(n,s)$. – Set Jan 11 '25 at 20:55