- Let $H \sim Hyp(N,K,n) $, where $Hyp$ denotes the hypergeometric distribution, $N$ the number of objects, $K$ the number of "good" objects, and $n$ the number of draws.
- I am interested in a particular bound for $\mathbb{P}(H \leq x)$.
- Let $B_x \sim Bi\left(n, \frac{K-x}{N-x}\right)$, where $Bi$ denotes the Binomial distribution.
Intuitively, if no more than $x$ of the $n$ draws associated with the Hypergeometric distribution are successful (i.e., result in drawing a "good" object), the probability of a success in each of these draws never falls below $\frac{K-x}{N-x}$. Therefore, the following inequality might seem like a reasonable conjecture:
(1)$\qquad$ $\mathbb{P}(H\leq x) \leq \mathbb{P}(B_x \leq x)$, $\qquad$ for all $x \leq K$.
I've looked online a little bit and couldn't find any reference to (1). Maybe this inequality is easy to prove or disprove, but I haven't been able to.
Beside the intuitive "argument" above, her is some (arguably very limited) suggestive evidence that (1) might be true.
Example 1
Suppose that $N =4$, $K=2$, and $n = 2$. Then,
- $\mathbb{P}(H \leq 0) = (1/2)*(1/3)= 1/6$
- $\mathbb{P}(B_0 \leq 0) = (1/2)*(1/2)= 1/4$
Also,
- $\mathbb{P}(H \leq 1) = (1/2)*(1/3) + (1/2)*(2/3) + (1/2)*(2/3)= 5/6$
- $\mathbb{P}(B_1 \leq 1) = (2/3)*(2/3) + (1/3)*(2/3) + (2/3)*(1/3)= 8/9$
Example 2
In Mathematica, plotting the difference between the two CDFs for a couple of values of the paramaters and $x$
DiscretePlot[
Table[CDF[HypergeometricDistribution[n, 50, 100],
k], {n, {10, 20, 50}}] -
Table[CDF[BinomialDistribution[n, (50 - k)/(100 - k)],
k], {n, {10, 20, 50}}] // Evaluate, {k, 0, 32}, PlotRange -> All]
yields
Some things I found difficult when trying to prove the inequality:
- As far as I know, there is no really convenient formula for the CDF's of Binomial and (even less so) of Hypergeometric distributions.
- If the inequality holds, it certainly does not hold "pointwise", in the sense that we don't have $\mathbb{P}(H = y) \leq \mathbb{P}(B_x = y)$ for all $y \in \{0,\dots, x\}$. So if the inequality holds, it really has to do with the whole sum of the PMFs from $0$ to $x$, which I find hard to play with.
My questions:
- Can someone provide a counter-examples or a proof of (1)?
- If (1) is true, is there a good reference for it that I could cite?
