2

Imagine we have $M$ elements, where some of them ($y$) are positive and the rest, $z=M-y$, are negative. The probability of drawing any of them is given by a distribution. Let's call $p_1$, $p_2$, ..., $p_y$ the probabilities for each of the positives and let us call $q_1$, $q_2$, ..., $q_z$ the probabilities for the other elements. Let us call $P=\sum_i^y p_i$ and $Q=\sum_i^z q_i$. We have that P+Q=1.

I'm only interested in the number of elements $N \leq M$ I need to draw to get one positive or, more precisely, the number of elements I need to draw to have a probability at least 0.5 of getting one positive. Of course, if the probabilities were uniform, this would be $N= \frac{M+1}{y+1}$, which equals $\frac{M+1}{MP+1}$ in this uniform case. (see Pulling cards from a deck without replacement to reach a goal: average draws needed?)

But what about non-uniform probabilities? I guess that in general this is bounded by $\frac{M+1}{MP+1}$, but there are extreme cases very far from this value (imagine that one $q_i$ = 0.99 and the rest of $q_i$ are very close to 0, with $P$ close to 0.01. Then, $N$ is 2 in this case). But with the same value of $P$ or $Q$ we can get very different values of $N$, as it depends on how uneven the distribution is. So there is no expression in terms of $P$ and $Q$ alone, as discussed in: Probability to choose specific item in a "weighted sampling without replacement" experiment

However, the question is not solved there. I would be happy with an approximation (the case with replacement is very easy but I'm not interested as $N$ can go to infinity even for a finite set $M$, while without replacement this is always bounded by $N$), as I guess that the exact result is not going to be in closed-form. For instance, an approximation in terms of the maximum and minimum probabilities for the positive and negative cases, for instance. Or in terms of the whole distribution itself.

Any ideas? Thank you in advance!

Giacomo
  • 21
  • A trivial observation is that the individual $p_i$s don't matter, since we stop drawing as soon as we get any positive element. So we might as well assume there's only one positive element with $p_1 = P$. – Ilmari Karonen Sep 14 '13 at 11:33

1 Answers1

1

This is nowhere near a complete answer, but...

One possibly useful way to look as drawing without replacement is to imagine that we're drawing elements with replacement, but we only count elements that we haven't already drawn before.

In particular, in the limit where $\max q_i \to 0$, the probability of drawing the same negative element twice goes to zero, and so drawing with or without replacement become the same process, with the expected number of draws needed to get a positive element equal to $1 + Q + Q^2 + Q^3 + \dotsb = 1 / (1 - Q) = 1 / P$.

This is also an upper bound for the expected number of draws without replacement needed to get a positive element for a given $Q$. For the corresponding lower bound, we can consider the case where $z = 1$ and $q_1 = Q$, i.e. where no draws of a negative element count except for the first one. In this case, the expected number of draws needed to get a positive element is simply $1 + Q$.

Of course, these may not be very useful bounds in practice, but the examples above show that they're the best bounds one can get by considering only $P$ and $Q$. For better bounds, one also needs to consider $z$ and $q_i$. (In particular, $z+1$ is obviously an upper bound.)

  • Thank you, but I don't follow your first derivation. In fact, it must be wrong, as that result is inversely proportional to the total positive probability. In the case with replacement, we have (1−P)N≤1/2(1-P)^N \leq 1/2, which leads to N \geq \frac{1}{-log_2 (1-P)} + 1N \geq \frac{1){-log_2 (1-P)} + 1. – Giacomo Sep 14 '13 at 18:09
  • Ignore comment above. The 1st derivation is for the expected number of draws to get a positive (which is not exactly the minimum number of draws to have 1/2 probability, but very close). In fact, in the case with replacement, we have $(1-P)^N \leq 1/2$, which leads to $N \geq \frac{1){-log_2 (1-P)} + 1$, very close to $1/P$. This is an upper bound for the non-replacement case, but only occasionally (and slightly) lower than $\frac{M+1}{MP+1}$, so it is of almost no use as an alternative. Your comment about considering just one positive case (so $y$ really doesn't matter) is useful, thank you. – Giacomo Sep 14 '13 at 18:30