Can we use a Cryptographic hash function to generate infinite random numbers?

Question

I have seen that there are PRNG that can generate a specific number of random-numbers. The Mersenne Twister as an example, can generate 2**19937 (if I'm not wrong) but... can we use a cryptographic hash function to generate infinite random numbers setting any seed to it? So I can set text for the seed and do it like this:

seed set to "myseed"
counter set to 0
first random number generated = first 64 bits of "myseed0" hashed with sha512
secound number = first 64 bits of "myseed1" hashed with sha512

Or we can save the other 448 bits left and use them later. The main idea, is 100% secure to generate PRN's from a hash function? why don't we do it?

score 28 · Accepted Answer · answered Dec 12 '19 at 12:29

This construction gives you cryptographic-quality pseudorandom output, but it isn't as secure as it can be for a random generator.

With commonly used hash functions $H$ (such as any of the SHA2 and SHA3 family), as far as we know, the bits of $H(\textrm{seed}, n)$ are unpredictable if you only know $n$ and $H(\textrm{seed}, m_i)$ for any number of values $m_i \ne n$, but you don't know $\textrm{seed}$. This makes $D(n) = H(\textrm{seed}, n)$ a good key derivation function: its output is essentially indistinguishable from random.

A good random generator must have the property that a bit in the output is not predictable even if the adversary knows all the other bits of the output, but not the seed. The construction $H(\textrm{seed}, \textrm{counter})$ has this property. But a good random generator also has an additional property: backtracking resistance. Backtracking resistance means that if the adversary compromises the hash state at some point, then they can't recover past outputs. (Of course the adversary will know every future output, at least until the random generator is reseeded.) Your construction does not have this property since the original seed remains a part of the hash state.

A good random generator has a “ratcheting” step, which makes it impossible to recover the previous state from the current state when generating some output. It's easy to build ratcheting with a hash function: you basically just run the hash function on the hash state. Take a hash function with an $n$-bit output. Start with an $n$-bit secret seed; that's the original state of the random generator. To generate up to $n$ pseudorandom bits, calculate $H(0 || \textrm{state})$ and output that; also calculate $H(1 || \textrm{state})$ and use this as the next internal state. In pseudocode:

state = seed
while True:
    output(hash('0' + state))
    state = hash('1' + state)

Hash_DRBG specified in NIST SP 800-90A is a popular pseudorandom generator construction based on this principle.

kelalaka · Answer 2 · 2019-12-12T19:45:03.063

Your scheme can be re-defined as; take a cryptographically secure hash function $\operatorname{H}$ and generate the sequence as;

Init the seed with $\text{seed}= \text{"myseed"}$
$\text{hash} = \operatorname{H}(seed\mathbin\|counter)$
$\text{counter} = \text{counter} +1 $
output $hash_{|\text{required size}}$ (trimming the output)
return step 2 for more random.

Now the first case, if the size of the counter if fixed then there will be a periodic sequence. The counter will be set back to 0 like in CTR mode.

If the let the counter run as long as it requires (not fixed size), then it won't have a periodic property. Somehow, if it has (that we don't expect), you will find a weakness for the underlying hash function. Of course, after running $2^{\ell/2}$ times, where $\ell$ is the hash output size, you will start to see collisions more frequently due to the birthday-paradox. The collisions are inevitable. If you use this construction for IV generation that requires lower bits than the output size, the collision probabilities will be much lower than the actual hash function itself like $2^{64}$ for 128-bit IV.

Can we use a cryptographic hash function to generate infinite random numbers?

Infinite is not a good measure here. One can define an infinite sequence but predictable 1010010010001... We want then Cryptographically secure pseudo Random number generator (CSPRNG). Usually, we require a fixed size random for example 128-bit random IV, 128,192,256-bit random key, or larger sizes as in RSA key-gen and as in RSA signatures.

Your construction is similar to HASH_DRBG NIST SP 800-90A which is seem secure if the hash function behaves like a random oracle.

why don't we do it?

We have it. HASH_DRBG and HMAC_DRGB are examples in better design. However, the NIST suggests that after $2^{48}$ requests the HASH_DRBG needs re-seeding page 38. Therefore, we cannot run them infinitely.

note: as point out Gilles's comment and answer your construction lacks backtracking resistance.

In conclusion, your scheme can go infinity but lacks backtracking resistance.

score 4 · Answer 3 · edited Dec 13 '19 at 04:19

Can we use a Cryptographic hash function to generate ...

Yes we can. Your code example shows that we can if the hash function is secure. That means a currently non invertible function. A slightly modified form is one of the older cryptographic Java RNGs, called SHA1PRNG.

$$ \left\{ \begin{alignat}{7} & \texttt{State}_{0} && = \texttt{SHA}_{1} \left(\texttt{SEED} \right) \\ \\ & \texttt{Output}_{i} && = \texttt{SHA}_{1} \left(\texttt{State}_{i-1} \right) \\ & \texttt{State}_{i} && = \texttt{State}_{i-1} + \texttt{Output}_{i} + 1 \, \operatorname{mod} \, {2}^{160} \end{alignat} \right. $$

Notice the modification: The output is fed back into the state. The 'counter' is simply a +1 in this case. Similar constructions are also key components of stream ciphers.

SHA-512 is fine but perhaps a bit over the top if you're looking only for 64 bit outputs. And sticking with SHA-1 and SHA-256 allows you to leverage Intel's hardware SHA extensions to make your RNG run faster.

...infinite...

Well not absolutely. All RNGs have a finite internal state. As you've already said. the common Twister's is of size $2^{19,937}$. A similar limit will apply to a whatever state variable you use for yours. When the limit is reached, the numbers will roll over and repeat. Good enough for practical purposes, but not entirely infinite.

...random numbers?

And it's important to call these 'pseudo-random' numbers rather than just random numbers, as in the world of random number generation there is an alternative called 'true' random numbers.

score 1 · Answer 4 · edited Jun 17 '20 at 08:17

In short, no.

Or rather, yes, but you don't want to do that.

Note, by the way, that "can we do that" and "is it 100% secure" in your question are different, antipodal things. Much like "MT" and "secure" are antipodal. All generated pseudorandom numbers (including those coming from secure pseudorandom generators and those coming out of cryptographic hash functions are deterministic, so they cannot be 100% secure anyway). Now, MT, which you gave as an example, is not in any way secure, it is very easily exploited. If you meant "secure" in a way of "are we confident that the bits will look random", then that's a different question. But in that case, you most likely want to use a generator that is orders of magnitude faster than a cryptographic hash.

A cryptographic hash function can be used to generate (pseudo-) random bits of an apparent quality comparable to dedicated random number generators. I say "apparent" because although cryptographic hash functions are designed with some things in mind that are desirable features of random number generators as well (think e.g. avalanche), they are not designed to be random number generators.
So, they kinda work as such, mostly, but it's not their real purpose and you do not have a hard guarantee that they will pass all tests that a specifically designed high-quality random number generator will pass (MT doesn't even pass them all either, by the way, it's comparatively poor).

The fact that you can use a cryptographic hash as a random generator is demonstrated by the fact that for example, the secure random number generator in at least one free open source operating system is implemented in exactly this way.

Then why am I saying "no"?

A hash function (cryptographic or not) can be considered being a sort of entropy extractor.

You input N bits and the function somehow produces M bits from these (and usually N >> M) in an obscure, hard to predict way such that you cannot easily find collisions, etc etc.

The M bits that the function outputs are (pseudo) random, or at least as good as. So you could say that the function extracts M bits of entropy from the message.

That is the exact reason why, for example, DJB recommended that you use a hash function after you did a curve25519 exchange and want to use the result as encryption key for your block cipher. You have some curve point which is not totally random, and it has more bits than you actually need, but also you know that it only has slightly fewer than 128 bits of entropy somewhere inside, and you do not know where. Obviously you want to use all the entropy that you're given. What to do?! Which bits should you use?
Hashing the point extracts that entropy, and ensures you don't throw any of it away.

So, let's think about what happens in our random number generator. We seed it with a certain amount of entropy, and then we keep extracting entropy from it forever. Wait a moment, if we extract some, what about remaining entropy? Yep, you guessed right. Eventually, very soon, we run out of entropy. It's still a random-looking deterministic sequence, of course. However, it is a sequence about which we practically do not know anything (e.g. what is its period lenght?).

Doesn't any random number generator have the entropy problem? Well yes, output is deterministic, and there is a finite number of numbers in an integer, so necessarily, sooner or later, you get the same sequence of numbers again, but this is a known problem and it's something that is explicitly addressed in the design (not so in the design of a cryptographic hash!).
Good generators try to maximise the period length (and some other things).
That's why MT has such a ridiculously large state. This huge state exists only to turn a rather poor generator into one with a very long period with a very large k-distribution (by only ever updating a small part of a huge state independently, and iterating over it).

PCG or xoroshiro variants (which, too, are not cryptographically secure) achieve practical periods (and, except for k-distribution, better properties otherwise!) with much, much smaller state. I say "practical" because one needs to realize that there is absolutely no difference between a 2^256 and a 2^19937 period. Even in massively parallel applications, a 2^256 period which can be subdivided with skip-ahead into 2^128 independent, non-overlapping sequences, is way more than you can use in your lifetime, even with an utopian farm of impossibly fast supercomputers. So, that's "infinite" for all practical purposes, just like 2^19937 is only "infinite", too.

In the case of the previously mentionend secure random generator used in an operating system, running out of entropy isn't very much a problem because it is being re-seeded all the time. So it never (well, never is a lie... let's say rarely, in normal conditions) runs out of entropy.

Can we use a Cryptographic hash function to generate infinite random numbers?

4 Answers4

In short, no.

Then why am I saying "no"?

Linked