1

Suppose we have $n$ elements with weights $w_i$'s for $i \in [n]$ with $\sum_i w_i = 1$ and we want to sample $k < n$ elements using sampling without replacement.

When we use sampling with replacement, clearly if $w'_i > w_i$, then regardless of the other weights, the probability of element $i$ being sampled in $k$ trials increases, since the probability is simply $$1 - (1 - w_i)^k.$$ However, we do not have a closed-form counterpart for sampling without replacement. Intuitively, this should also hold, but I cannot have a rigorous proof for that. See Probability to choose specific item in a "weighted sampling without replacement" experiment. See also Probability of Choosing an Item in Weighted Random Sampling Without Replacement.

Any idea would be highly appreciated.


In fact, this is not true in general.

Counterexmaple: Consider $n = 3$, $k = 2$, and $W = (0.1, 0.89, 0.01)$ and $W' = (0.2, 0.4, 0.4)$. Here $0.1 = w_1 < w'_1 = 0.2$, but the probability of element $1$ being sampled decreases from $0.999119$ to $0.952$.

Do we have additional conditions to make this hold?

Vezen BU
  • 2,320

1 Answers1

1

One rather simple, but constraining condition would be that $w_j'≤w_j$ for all $j\ne i$. The probability to draw $i_1, \dots, i_k$ in that order is given by $$ \prod_{l=1}^k w_{i_l}\left(1-\sum_{r=1}^{l-1} w_{i_r}\right)^{-1} $$ and it is easy to see that this is strictly monotonous in $w_{i_l}$ for all $l=1,\dots, k$. So all outcomes without $i$ would become less likely if $w_j'≤w_j$ and thus outcomes with $i$ will be more likely. (As long as $0<k<n$).

A second observation is that for $j,j'\ne i$ with $w_j > w_{j'}$, widening the gap by setting $w_j' := w_j + x, w'_{j'} := w_j - x$ for some $x > 0$ is only making the probability bigger: The probability above is not only increasing with $w_{i_l}$, it is also convex in $w_{i_l}$, as can be seen e.g. by differentiating. Looking at the probability to draw either $j$ or $j'$, we see that this means that widening the gap is making the probability to draw either $j$ or $j'$ bigger. Let $X$ be the drawn set, then $$ P(i\in X) = P(j,j'\in X)P(i\in X|j,j'\in X) + P(|\{j,j'\}\cap X|=1)P(i\in X||\{j,j'\}\cap X|=1) + P(i\in X; j,j'\notin X) $$ Note that the conditional probabilities do not depend on $x$, only on the sum $w_j+w_{j'}$, because they are just the probability to draw $i$ in $k$ resp. $k-1$ or $k-2$ draws with the weights of $j$ and $j'$ set to zero. This also means that $P(i\in X|j,j'\in X)$ is smaller than $P\bigl(i\in X\big||\{j,j'\}\cap X|=1\bigr)$, so increasing $x$ is increasing the probability of drawing $i$.

Conversely, making two weights more similar decreases the probability of drawing $x$, as seen in your counterexample, where you moved the probabilities of the other values closer, from 0.89 and 0.01 to 0.4 and 0.4.

Combining these two facts gives the condition that for all $l=1,\dots,n-1$, the sum of the $l$ smallest weights $w'_j, j\ne i$ must be smaller than the sum of the $l$ smallest weights $w_j, j\ne i$.

Dodezv
  • 742