5

Suppose we have a random number generator badrand() generating 1 with probability $0.9$ and 0 with probability $0.1$.

How can we combine $n$ bits from badrand() to get one bit of betterrand()? What will be its probability of giving a 0 or 1?

I assume that this is possible since the entropy of the combined output increases with each call to badrand(). Please guide me if I am wrong.

Ritesh Singh
  • 319
  • 2
  • 9

3 Answers3

2

If the outputs of badrand are independent and identically distributed you can use a randomness extractor function with $m=1$ and $k=-n\lg 0.9$.

If you're prepared to have a non-deterministic waiting time for an output, the von Neumann extractor is good way to get probability exactly 0.5 out of i.i.d. bit outputs. Sample pairs of bits until you get a pair with precisely one zero; if your pair is 01 return 0 and if it is 10 return 1. The i.i.d property means that the probability of these two events is equal (in your example both have probability 0.09 and resampling has probability 0.82, leading you to expect around 100/9 badrand outputs to produce one betterrand output).

Daniel S
  • 29,316
  • 1
  • 33
  • 73
1

This is pretty common and easy to fix. Although that's a really poor RNG :-)

In the case of very small computational capabilities (e.g. Arduino Uno), we'd extract via the von Neumann technique as per the other answer [see note though]. For larger devices (e.g. ARM, Snapdragon and bigger) we leverage the Leftover Hash Lemma (LHL). That allows extraction in $O(1)$ time.

Given that there is no such thing as pure randomness (it's always $\epsilon$ away from unbiased), we know from LHL that $\epsilon = 2^{-(sn-k)/2}$. We then:-

  1. Choose an acceptable bias, say $\epsilon = 2^{-64}$ as per NIST recommendation.

  2. Select a hash function to act as the principle randomness extractor, say SHA-256 so that we can utilise CPU SHA extensions for uber speed.

  3. Solving the LHL formula leads us to determine that the input length of bits to the hash function should be 2527 bits, given that your badrand() min.entropy ($H_{\infty}$) is 0.152 bits/bit.

  4. Therefore final entropy bias will be $(1-\frac{1}{2^{64}})$ bits/bit.

  5. Converting that to a probability bias, $P(x_i = 0, x_i = 1) \approx 0.5 \pm 2^{-66}$.


Note: von Neumann absolutely requires that the input stream be uncorrelated. If it has auto-correlation (typically $R > 10^{-3}$), the extraction technique is inappropriate. LHL has no such restriction. It only requires a measurement of $H_{\infty}$ so is more flexible, although it has a higher computational element only appropriate for larger processors.

Paul Uszak
  • 15,905
  • 2
  • 32
  • 83
1

You could leverage hash functions which are built off of the sponge construction, to serve as both a randomness extractor & entropy pool, simultaneously. The sha3-512 standardized hash function would be suitable, as it offers various strengths, including always producing bits that appear random & statistically-independent of the input data's bit distribution. This shifts the focus from how to produce near-uniform output from a non-uniform source (a rigorously solved problem), to how to make the output as difficult to reproduce as you'd like.

Think of the sha3-512 function as an object that maintains a 1600-bit entropy pool. The object can be updated with as many bits as desired, & a unique 512-bit pseudo-random output can be obtained after each update of the object. The entropy pool for the object starts off with zero bits of entropy, & each time the object is updated with new bits of some entropy distribution, the entropy pool's entropy increases proportionally. But how many bits need to be fed into the object to achieve a sufficient amount of entropy? There's a way to work that out.

If $badrand()$ produces bits

1 with probability 0.9 and 0 with probability 0.1

then we can use the formula

$\sum_{i=0}^{1} $ $Pr(c_{i})$ $⋅$ $log_2($$\frac{1}{Pr(c_{i})}$$)$

where $c_{0} = 0$, $c_{1} = 1$, and $Pr(x)$ gives the probability of $x$ occurring, to calculate

$Pr(c_{0})$ $⋅$ $log_2($$\frac{1}{Pr(c_{0})}$$)$ $+$ $Pr(c_{1})$ $⋅$ $log_2($$\frac{1}{Pr(c_{1})}$$)$ $=$ $0.46899559358928133$

as the amount of entropy (in bits) each call to $badrand()$ can be expected to produce on average.

But, as the above answers describe, the minimum entropy

$H_{min} = -log(0.9) = 0.15200309344504997$

is the value that should be used for various reasons (page 10).

If $N$ is the amount of entropy (in bits) you wish to fill the pool with, and we use $B = 2^{H_{min}}$ as a logarithmic base, then the number of calls to $badrand()$ you'd need to initialize the object is

$N_B = log_B(2^N)$

Once the object is initialized, you then have a forward secret pseudo-random number generator that produces bits in 512-bit chunks that are indistinguishable from random, contains entropy $≈ N$, & increases in entropy each time it's updated with additional entropic material. After you've consumed the first 512-bit output, feed at least another $bitrate = 576$ number of bits from $badrand()$ to the object before reading each subsequent output. This causes the internal state to be permuted using its $f$-function.

aiootp
  • 1,182
  • 4
  • 11