3

I am considering the following probabilistic balls-into-bins model. There are $n$ bins and two types of balls. For each type, there are $\rho$ balls. Each ball independently lands in bin $i$ with probability $p_i$, where $\sum_{i=1}^{n} p_i = 1$.

Define the following random variables:

  • Let $X_i$ be the number of type-1 balls in bin $i$.
  • Let $Y_i$ be the number of type-2 balls in bin $i$.
  • A collision in bin $i$ occurs if there is at least one ball of each type in bin $i$, i.e., if $X_i \geq 1$ and $Y_i \geq 1$.
  • Let $C_i$ be the indicator random variable for a collision in bin $i$, so $C_i = \mathbf{1}_{X_i \geq 1, Y_i \geq 1}$.

I am interested in bounding the probability that there is no collision anywhere, i.e., $$ P\left( \sum_{i=1}^{n} C_i = 0 \right) = P(\forall i, C_i = 0), $$ with an upper bound $O(n^{-c})$ where $\rho = O(\sqrt{n \log n})$ (the constant $c$ is hidden inside $\rho$).

Challenges:

A natural approach is to use the independence of the bins. However, the random variables $C_i$ are not independent, since they depend on the shared total number of balls of each type. This prevents a straightforward product-bound.

A common technique for approximating such problems is the Poisson approximation (see e.g., Mitzenmacher & Upfal, Probability and Computing), which often gives a good upper bound up to a factor of 2. My goal is to use this approximation to obtain a tight upper bound, possibly followed by Jensen’s inequality or other methods.

Roadblock:

The usual Poisson approximation applies to a single balls-into-bins process, while here we have two dependent processes. One idea I had was to define an alternative single-process model:

  • Consider $2\rho$ balls of one type.
  • Instead of $n$ bins, use $2n$ bins, indexed by $\{1, \dots, n, n+1, \dots, 2n\}$.
  • Place each ball in bin $i$ or bin $i+n$ with probability $0.5 p_i$, so that the probability of either bin $i$ or bin $i+n$ receiving a ball is still $p_i$.
  • I define a collision in this model as bins $i$ and $i+n$ both containing a ball.

I thought that this new process may upper bound the original one, and I was hoping to use the Poisson approximation on it. However, I am unsure how to formally argue that this new model indeed provides the desired upper bound for the original process.

Question:

How can I rigorously bound $P(\forall i, C_i = 0)$ using the Poisson approximation or another technique? Specifically, is my proposed transformation into a single-process model valid for upper bounding the original probability? If not, is there another way to effectively apply the Poisson approximation in this setting?

Idra
  • 210
  • 1
  • 10
  • My end goal is showing that this event occurs "with low probability" (at most n^(-c) for some constant c). I don't mind how I get this bound. Regarding an explicit answer, since this is an arbitrary probability p, I would have to optimize it somewhere (jensen, lagrange optimization, etc). I suspect the worst case would occur for uniform distribution (but obviously I would have to prove that after I get my bound). – Idra Apr 03 '25 at 10:29
  • Anyway, would love any suggestion for how to go about it. – Idra Apr 03 '25 at 10:30
  • The main tiring step is in calculating $P(C_{1}=1,C_{2}=1)$ which involves calculating the probability for many (16 I think) cases in which collision fails to occur in atleast one of the two boxes among $1$ and $2$. Once you do that, you can bound by Chebycheff's inequality as $P(C=0)\leq \frac{Var(C)}{E(C)^{2}}$. Well, $E(C)$ is easy to calculate as it's just $n \left(1-(\frac{n-1}{n})^{\rho}\right)^{2}$ (i.e. $nP(C_{1}=1)$. It is for the variance (or $E(C^{2})$) that you'll get the cross term $n(n-1)P(C_{1}=1,C_{2}=1)$. – Mr. Gandalf Sauron Apr 04 '25 at 09:36
  • The guiding heuristic that you should keep in mind is that $E(C^{2})\sim (E(C))^{2}$ and hence, by Chebycheff's inequality or by using the second moment method, $P(C=0)\to 0$. Note that this type of reasoning will only work when $\rho$ does not grow very fast with $n$. The exact thresholds can be found by an argument similar to the coupon collector problem. I would have written an answer explaining the above steps but for calculating the tiring $P(C_{1}=1,C_{2}=1)$. If you find the time to calculate that, then I believe, you'll find the conclusion you are looking for. – Mr. Gandalf Sauron Apr 04 '25 at 09:47
  • I'm not sure chebyshev will give a tight enough bound (see updated post). Also, the probability is not uniform, but p_i for bin i. – Idra Apr 04 '25 at 11:50
  • I think I've found an answer. I'll just use the poisson approximation (with factor error 2) twice + the law of total probability twice, and get a poisson approximation with error factor 4. The rest can be done using jensen's inequality. Will write an answer on sunday probably, but if anyone has another idea, would love to hear it. – Idra Apr 04 '25 at 16:19
  • Probably too wide to be useful, but the probability a bin $i$ has no collision is $2(1-p_i)^\rho-(1-p_i)^{2\rho}$,

    so the expected number of bins with no collisions is $\sum\limits_i 2(1-p_i)^\rho-(1-p_i)^{2\rho}$,

    so the probability all $n$ bins are empty must be between $1-n+\sum\limits_i 2(1-p_i)^\rho-(1-p_i)^{2\rho}$ and $\frac1n \sum\limits_i 2(1-p_i)^\rho-(1-p_i)^{2\rho}$. Unfortunately the first of these can easily be negative.

    – Henry Apr 04 '25 at 16:54

1 Answers1

1

Instead of forcing the situation to suit the existing theorem (for the single-process balls-into-bins) described in the post and found in standard references, we prove a new theorem specifically for the two-process case.

First, we show that the Poisson approximation holds for the general multinomial distribution $(p_1, \dots, p_n)$, and not just for the uniform case ($p_i = \frac{1}{n}$), which is the case typically handled in the literature.


Theorem 1 (Poisson Approximation for a Single Process)

Let $X = (X_1, \dots, X_n) \sim \mathrm{Multinomial}(m, p_1, \dots, p_n)$, and let $Y = (Y_1, \dots, Y_n)$, where the $Y_i$ are independent and $Y_i \sim \mathrm{Poisson}(m p_i)$. Let $f : \mathbb{N}^n \to \mathbb{R}_{\geq 0}$ be any non-negative function such that $\mathbb{E}[f(X_1, \dots, X_n)]$ is either monotonically increasing or decreasing in $m$. Then:

$$ \mathbb{E}[f(X_1, \dots, X_n)] \leq 2 \cdot \mathbb{E}[f(Y_1, \dots, Y_n)]. $$

In particular, if $ e $ is an event (depending only on the vector of bin occupancies) that is monotone in the number of balls $ m $, then: $$ \Pr[e(X)] \leq 2 \cdot \Pr[e(Y)]. $$


Proof Sketch:

Let us assume without loss of generality that $ f $ is increasing in $ m $ (the decreasing case is analogous).

Let $ Z = \sum_{i=1}^n Y_i $. Note that $ Z \sim \mathrm{Poisson}(m) $, and conditioned on $ Z = m $, the vector $ (Y_1, \dots, Y_n) \sim \mathrm{Multinomial}(m, p_1, \dots, p_n) $—this is a standard fact in Poisson approximation (see e.g., Mitzenmacher and Upfal for the multinomial uniform case; the general multinomial case is similar).

We use the law of total expectation:

$$ \mathbb{E}[f(Y)] \geq \mathbb{E}[f(Y) \mid Z \geq m] \cdot \Pr(Z \geq m). $$

Since $ f $ is increasing, conditioning on $ Z \geq m $ increases the expectation. In particular, $ \mathbb{E}[f(Y) \mid Z = m] = \mathbb{E}[f(X)] $, so:

$$ \mathbb{E}[f(Y) \mid Z \geq m] \geq \mathbb{E}[f(Y) \mid Z = m] = \mathbb{E}[f(X)]. $$

It is a known result (Mitzenmacher and Upfal, Exercise 5.11) that if $ Z \sim \mathrm{Poisson}(\mu) $ with $ \mu \geq 1 $ an integer, then:

$$ \Pr(Z \geq \mu) \geq 1/2 \quad \text{and} \quad \Pr(Z \leq \mu) \geq 1/2. $$

Therefore:

$$ \mathbb{E}[f(Y)] \geq \mathbb{E}[f(X)] \cdot \Pr(Z \geq m) \geq \frac{1}{2} \cdot \mathbb{E}[f(X)]. $$

(For the decreasing case we just use $ \Pr(Z \leq \mu) \geq 1/2 $ from Exercise 5.11.)


Theorem 2 (Poisson Approximation for Two Independent Processes)

Let $ A = (A_1, \dots, A_n) \sim \mathrm{Multinomial}(m, p_1, \dots, p_n) $, and independently $ B = (B_1, \dots, B_n) \sim \mathrm{Multinomial}(m, q_1, \dots, q_n) $. Let $ C = (C_1, \dots, C_n) $, $ D = (D_1, \dots, D_n) $ be independent Poisson vectors where $ C_i \sim \mathrm{Poisson}(m p_i)$ and $D_i \sim \mathrm{Poisson}(m q_i) $.

Let $ f : \mathbb{N}^{2n} \to \mathbb{R}_{\geq 0} $ be a non-negative function such that $ \mathbb{E}[f(A_1, \dots, A_n, B_1, \dots, B_n)] $ is either monotonically increasing or decreasing in $ m $. Then:

$$ \mathbb{E}[f(A, B)] \leq 4 \cdot \mathbb{E}[f(C, D)]. $$

That is, the Poisson approximation is a 4-approximation for monotonic events over two independent multinomial processes with general bin probabilities.


Proof:

We apply Theorem 1 twice using the law of total expectation. Again assume without loss of generality that $ f $ is increasing in $ m $.

First, condition on $ B $. Then $ f(A, B) $ as a function of $ A $ is still increasing. So:

$$ \mathbb{E}[f(A, B)] = \mathbb{E}_B\left[ \mathbb{E}[f(A, B) \mid B] \right] \leq \mathbb{E}_B\left[ 2 \cdot \mathbb{E}[f(C, B) \mid B] \right] = 2 \cdot \mathbb{E}[f(C, B)]. $$

Now, we apply the law of total expectation again, this time conditioning on $ C $. Since $ f(C, B) \mid C $ is increasing in $ B $, we apply Theorem 1 again:

$$ \mathbb{E}[f(C, B)] = \mathbb{E}_C\left[ \mathbb{E}[f(C, B) \mid C] \right] \leq \mathbb{E}_C\left[ 2 \cdot \mathbb{E}[f(C, D) \mid C] \right] = 2 \cdot \mathbb{E}[f(C, D)]. $$

Putting it all together:

$$ \mathbb{E}[f(A, B)] \leq 2 \cdot \mathbb{E}[f(C, B)] \leq 4 \cdot \mathbb{E}[f(C, D)]. $$


This gives us a general Poisson approximation result for any monotonic event over two balls-into-bins processes. In particular, if $ e $ is any monotone event, we have:

$$ \Pr[e(A, B)] \leq 4 \cdot \Pr[e(C, D)]. $$


From here I supposed we can use Jensen's inequality, Lagrange multiplier optimization, or, using Taylor series, upper bound the positive $e^x$ terms and lower bound the negative $e^y$ terms, in order to reach the goal of an upper bound of the form $n^{-c'}$. I don't include this as part of the answer since I did not try this thoroughly (as I discovered my case is even more difficult than I thought, two different distributions p and q), but for my quick attempts these seem to have worked fine.

Idra
  • 210
  • 1
  • 10