2

Inspired by this SO post, I'd like to generate random bitstrings of a length $n$, having at most $k$ bits set for some $k$. Bitstrings should be selected uniformly at random amongst all possible bitstrings. (The original question also asks them to be unique, but that's easily handled with e.g. a set to catch duplicates).

Currently, I have a few approaches, none of which is ideal:

  • For large $k$ ($\approx n/2$), just perform rejection sampling across all bitstrings of length $k$.
  • For very small $k$, we can explicitly calculate the distribution of 1-bit counts via the binomial coefficients, then randomly select a target count from this distribution. (Generating bitstrings with exactly k bits set is much easier)
  • For general use, choose a sequence of length $n$ out of a pool of $n$ 0s and $k$ 1s. While efficient, this approach is biased in favour of 1-bit counts near $k/2$.

Is there an efficient and correct algorithm for selecting random bitstrings with up to $k$ bits set?

nneonneo
  • 226
  • 1
  • 6

1 Answers1

0

Yes. Let $f(n,k)$ denote the number of such bitstrings (i.e., number of $n$-bit strings containing $\le k$ ones). Then it is easy to sample from this set. In particular, use the following recursive algorithm:

Sample the first bit of such a string by choosing 1 with probability $p$ and 0 with probability $1-p$, where $p=f(n-1,k-1)/f(n,k)$. If the first bit is 1, pick the remaining $n-1$ bits by recursively sampling from $n-1$-bit strings containing $\le k-1$ ones. If the first bit is 0, pick the remaining $n-1$ bits by recursively sampling from $n-1$-bit strings containing $\le k$ ones.

So all that remains is to describe an algorithm to compute $f(n,k)$. And this is easy to compute using dynamic programming, by taking advantage of the recurrence

$$f(n,k) = f(n-1,k-1) + f(n-1,k),$$

with base cases $f(n,0)=1$.

D.W.
  • 167,959
  • 22
  • 232
  • 500