8

Suppose I receive a list of 1 million coinflips, and I want to know how likely it is that the list was randomly generated.

My first thought would be to count the number of heads and tails, which should be evenly distributed (around 500.000). But suppose the distribution looks normal, its still possible the list contains patterns or repititions. For example, the first half of the list may be the heads, and the last half the tails. In real random data, that would be highly unlikely.

So how do you calculate the 'randomness' of this list?

Maestro
  • 1,069
  • 1
  • 10
  • 17

2 Answers2

7

You can never actually prove that it was generated randomly or pseudorandomly. You can only prove with high probability that it wasn't. Calculating the number of heads and tails is one way. Another is calculating runs of consecutive heads or tails. There is a suite of statistical tests from NIST in their FIPS 140-2 document which is a good place to start.

Having said that, for cryptographic purposes you really need to be sure that you are using a secure random number generator and there aren't any tests you can apply to the data itself to sufficiently guarantee that it is secure enough.

Travis Mayberry
  • 1,315
  • 9
  • 8
1

Find NIST statistical tests suite for (p)rng provided by National Institute of Science and Technology (Formerly NBS, National Bureau of Standards) here http://csrc.nist.gov/groups/ST/toolkit/rng/index.html

and documention at NIST (see above) or here: National Institute of Science and Technology (Formerly NBS, National Bureau of Standards)

ABri
  • 209
  • 2
  • 9