2

Say I have a 256 bit bit-string, and I know that there are 16 ones and 240 zeros. I know that this bit string can be compressed, because there are only 256 choose 16 possible strings that satisfy this condition: that's about 2^83.06.

So I know that I should be able to compress this bit string down to 84 bits long. But I don't know how to go about doing it. I tried to make a lookup table, but can't, because it's too big.

Is there a way to quickly perform this compression/decompression?

1 Answers1

2

The operation of translating between combinatorial objects and their indices in some enumeration is known as ranking/unranking (ranking is to convert an object to a number, unranking is the opposite). In your case, you are interesting in ranking/unranking subsets.

Suppose that you are given a subset $S$ of $\{1,\ldots,n\}$ of size $k$, and want to convert it to an integer in $0,\ldots,\binom{n}{k}-1$. If $k = 0$ or $k = n$ then you can just output $0$. Otherwise, we want to put the $\binom{n-1}{k}$ subsets not containing $n$ before the $\binom{n-1}{k-1}$ subsets containing $n$. Therefore:

  • If $S$ doesn't contain $n$ then you just output the rank of $S$ as a subset of $\{1,\ldots,n-1\}$ of size $k$.
  • If $S$ does contain $n$, you compute the rank of $S \setminus \{n\}$ as a subset of $\{1,\ldots,n-1\}$ of size $k-1$, and add to it $\binom{n-1}{k}$.

The decoding procedure is very similar. If $k = 0$ you output $\emptyset$, if $k = n$ you output $\{1,\ldots,n\}$, and otherwise:

  • If the index is less than $\binom{n-1}{k}$, decode it as encoding a subset of $\{1,\ldots,n-1\}$ of size $k$.
  • Otherwise, subtract $\binom{n-1}{k}$ from the index, decode it as encoding a subset of $\{1,\ldots,n-1\}$ of size $k-1$, and add the element $n$ to the resulting subset.

You can precompute all relevant binomial coefficients to speed things up. In your case, the index is more than 64-bit long, so you will have to work with 128-bit arithmetic somehow, but the resulting algorithm should be reasonably fast.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514