1

I'm writing a tool for manually collecting entropy from sources like coins and dice, and I'm wondering how to best deal with the case where the desired number of bits of entropy is not representable with a whole number of inputs. For example, when collecting 256 bits of entropy with a D6; log6(2^256) = 99.034318652.

The number of bits being collected, and the base, are arbitrary. Power-of-two bases are special cased to work directly with bits, as well as offering Von Neumann skew correction. Non-power-of-two bases are collected with a big integer: aggregate = aggregate * base + input where input is a value inclusively from 0 to base-1.

The simple solution which springs to mind is rounding (ceil, floor or true rounding) the number of inputs, but I'm not sure which strategy would be best, or whether there is some better option to minimize at least the predictability of biased bits in the result (my instinct is 'doing something with a CHF', but I've no idea what).

Cheers!

Isaac
  • 13
  • 2

1 Answers1

0

my instinct is 'doing something with a CHF', but I've no idea what

Your instinct to use a cryptographic hash function is line with what I'd recommend.

Using SHA3 as a re-seedable PRNG for entropy collection is a modern approach with various benefits(0)(1)(2). I've written about considerations when initializing & extracting from a SHA3 object in this context before. However, my recommendation these days would be to use the SHAKE256 XOF from the SHA3 standard instead of SHA3-512 for improved convenience (& other technical reasons).

The simple solution which springs to mind is rounding (ceil, floor or true rounding) the number of inputs

If you're dealing with float representations of events, you don't want to round them. You'd be cutting off the least significant bits, which is the section of a float where the most variation can typically be found. If you don't have to deal with float representations, then I'd recommend you don't — they can be imprecise & cause information loss.

Aside from that, finding a representation for the random events is mostly a consideration of efficiency, since the hashing object can be relied on to produce uniform outputs. For instance, a coin flip can optimally be represented by a single bit. A single D6 event can be represented in 3 bits. If you collect the events as a concatenation of their bit-wise representations, then you can efficiently feed those bits/bytes into a SHAKE256 object without extraneous conversions.

NIST SP 800-90 parts (A)(B)(C) may also contain useful recommendations for you.

aiootp
  • 1,182
  • 4
  • 11