How confident am I to say my random number generator is broken?

Question

related question: Should I put number combinations like 1111111 onto my lottery ticket?

It's well-established from the previous question that combinations like 111111 are not less favored in a lottery draw. So here's my new question: if I have an algorithm that generates 0 or 1 with equal possibility, and the first 1000 output are all 1's, how confident am I to say my random number generator is broken? Evidently, the possibility of the output being 11111111 is no less that it being, like, 1001101011.

Please make sure that your argument applies for the following outputs:

1111111111011110011111111111111110011111111111101111110111111111(almost all 1's)

10101010101010010101111010101010101010 (note that there's 1111 and 00 among all the 10's)

1011010001 1011010001 1011010001 1011010001 (looks random enough, but repeats itself)

Very interesting question. I have read bits before on how statistical analysis is done on numbers produced by random.org. This may help: http://www.random.org/analysis/. — brogrenkp, Dec 15 '13 at 04:00

score 3 · Answer 1 · answered Dec 15 '13 at 04:38

RNGs are subjected to a wide range of tests. The simplest, having a reasonable balance of $0$'s and $1$'s would reject your first example. You can make this precise by computing the chance (over a much longer run than you show) that the imbalance is a random fluctuation. If you insist that the chance of a random fluctuation is less than $10^{-6}$, for example, you will only reject one in a million of the good RNGs. Out of $50$ bits (I didn't count yours) we would then insist on at least $8$ of each type. For a large number (so we can use the normal approximation) we need to be within 5 standard deviations. If we tried $10^6$ bits, you need to be within $\pm 2500$ of $500,000$

The next test is short length correlations: are the number of $00, 01, 10, 11$ pairs about what you would expect? Then you check longer patterns. Your second example would fail this test. The third would also, but that is a bit of an accident. It only has two $00$'s and two $11$'s per ten bits and three $01$'s and three $10$'s. If you run enough bits this will fail the $10^{-6}$ test as well. You can find a run of $2^{10}=1024$ bits that has every ten bit pattern once. No runlength test that considered less than runs of $10$ bits would reject one that repeated this pattern forever.

What you would like to demand is that the Kolmogorov complexity of the RNG output be close to the number of bits output. Unfortunately, that is very hard. Suppose my RNG output the binary expansion of $\sqrt 2$ starting from the $234394349$th bit. That is a hard pattern to detect, but it is not "random". For many purposes it would be quite acceptable. I could write a short program to do that. As we tend to write short programs, there will always be short description of the output string. Basically we put the RNG up against the tests people have thought of, then hope that any one that passes them all is good enough for what it is used for.

How confident am I to say my random number generator is broken?

1 Answers1