Binary subset rank and unrank

Question

Let there be "N" bits.

We want to rank and unrank a specific subset of bit combinations based on the following criteria -

The number of consecutive 0s must be minimum "k" or more (k<N).

How can we rank (and unrank) only based on the given bit combination (and not by iterating the full range of 2^N bits). This will be efficient for large N bits.

I tried the following approach (example with N=5 and k=3) -

Start from LSB bit (position 1), and count the number of possible combinations with >=3 consecutive 0s:

Position 1 - 0 >=3 consecutive 0s
Position 2 - 0 >=3 consecutive 0s
Position 3 - 1 >=3 consecutive 0s
Position 4 - 3 >=3 consecutive 0s
Position 5 - 8 >=3 consecutive 0s

How can this information be used, along with bit-shifting patterns, to do the rank and unrank of a given bit combination.

This is another question where the ranking is based on "<=2" consecutive 0s; however what I now want is the opposite case, i.e. >=3 consecutive 0s.

Order in a subset

Example -

N: 5 bits
k: >=3 consecutive 0s
00000 - k>=3 - rank - 00001 - unrank - 00000
00001 - k>=3 - rank - 00010 - unrank - 00001
00010 - k>=3 - rank - 00011 - unrank - 00010
00011 - k>=3 - rank - 00100 - unrank - 00011
00100
00101
00110
00111
01000 - k>=3 - rank - 00101 - unrank - 01000
01001
01010
01011
01100
01101
01110
01111
10000 - k>=3 - rank - 00110 - unrank - 10000
10001 - k>=3 - rank - 00111 - unrank - 10001
10010
10011
10100
10101
10110
10111
11000 - k>=3 - rank - 01000 - unrank - 11000
11001
11010
11011
11100
11101
11110
11111

Thanks!

score 1 · Accepted Answer · answered May 19 '24 at 02:01

Conceptually, you can apply the standard generic approach. Ranking: to compute the rank of a string $x$, you compute the number of valid strings $s$ such that $s \preceq x$, i.e., $s$ lexicographically precedes $x$ (or is equal to $x$).

Unranking: to construct a string of rank $r$, you use binary search on the string. Given any string $s$, you can compute its rank and check whether the rank of $s$ is $<r$ or $>r$; if it is $<r$, then you choose a lexicographically earlier string, otherwise you choose a lexicographically later string. By using binary search (splitting the strings in half at each step), you obtain an efficient procedure for unranking, once you know how to do ranking.

So all that remains is to apply this to your particular case, and specifically, to come up with a ranking algorithm. This is conceptually easy. Build a deterministic finite-state machine $M$ that recognizes strings that follow your desired pattern. Build another $M'$ that recognizes strings that lexicographically precede $x$. Use the product construction to compute a deterministic finite-state machine $M''$ that accepts the intersection of those two languages, i.e., $L(M'')=L(M) \cap L(M')$. Finally, count the number of strings accepted by $M''$; see Why isn't it simple to count the number of words in a regular language? and https://cstheory.stackexchange.com/q/8200/5038 for an algorithm to do that.

If you work through the details of this, you will obtain an algorithm that solves your problem, and scales well for large $N$.

Binary subset rank and unrank

1 Answers1