5

Someone pointed out recently to me that a cryptographic hash function " is not designed as a bijective mapping from N bit input to N bit output".

So if I feed an N-bit cryptographic hash function with N bits of random input, there's a loss of entropy between the input and output of the hash function.

Considering the md5 hash function, is there a way to estimate that loss of entropy? And is this loss cumulative so I could say, if I apply the hash function enough times, I end up with a 50% loss of entropy?

1 Answers1

4

Actually, no. If it is a good Hash, you should roughly have $N-k$ bits of output entropy for some $k$ of much lower order than $N$.

The problem arises when the input is much longer than $N$ bits.

One way to estimate the entropy loss of such a Hash applied to $N$ bit inputs is to model it as a randomly chosen function on $N$ bits. This was first done by Odlyzko and Flajolet. There is a nice review with updated results here

Let $\tau_m$ be the image size of the $m$th iterate of the function. The entropy can be related to its behaviour.

If the function is a permutation, $\tau_m=2^N$ for all $m\geq 1$ and there is no entropy loss.

Edit: See the comment and link by @fgrieu which is an estimate of what I called $\tau_1.$ He is saying that $$ \tau_1\approx 2^{128-0.8272\cdots } $$ for $N=128.$

kodlu
  • 25,146
  • 2
  • 30
  • 63