6

I am in need of a non-uniform random number generator where each n-bit output has a hamming weight with a certain binomial distribution.

For example, I would like a non-uniform PRNG which generates 32-bit outputs with a hamming weight whose binomial distribution is n=32, p=0.1. For instance, 0xFF should be output with significantly less probability than 0x200, which in turn should have the same probability as 0x1.

Perhaps I can modify the output of a PRNG like xorshift or a LFSR to accomodate for this? I thought about rejection sampling the output, but the distribution of hamming weights for a uniform PRNG does not necessarily envelope a given binominal distribution with a variable parameter p, especially when p << 0.5.

I am not concerned about the cryptographic quality of the output. However, I am working on a 8 bit microcontroller with 2 KB SRAM, so memory and speed are both my primary concern. In the most naive case, I would just generate an array of random numbers and convert each element to 0 and 1 given a threshold probability, and finally convert this resulting array of 0's and 1's to an integer. But I would really, really like to avoid this memory overhead of an n-element array.

Ollie
  • 163
  • 2

1 Answers1

10

The obvious way to do this is to generate N words, and use logical operations to combine them in a single word such that each bit of the output word is a 1 with probability approximately 0.1 (and the individual bits are uncorrelated).

In the simplest case, you could generate 3 words, and just AND them together into a single one. In C, this would be:

     r1 = rand();
     r2 = rand();
     r3 = rand();
     return r1 & r2 & r3;

This gives each bit set with probability 0.125, which is close to 0.1

If that's not quite close enough, you can get a closer approximation by using more bits; for example, r1 & r2 & r3 & ~(r4 & r5) results with bits set with probability $3/32 = 0.09375$

With this technique, you use $n$ random words to generate bits set with probability $k 2^{-n}$ for some integer $k$; this can be made arbitrarily close to 0.1.

This obviously uses minimal memory; the computation time isn't too bad (assuming your rand implementation is cheap), unless you insist on a quite good approximation to your target probability.

And, while I said 'words', your implementation would use whatever size it finds most convenient; for an 8 bit CPU, each word might be 8 bits (and you just do it 4 times to generate the required 32 bits).

poncho
  • 154,064
  • 12
  • 239
  • 382