7

I came across this Answer stating (just a line from the answer):-

The input space is "infinite" and thus it has an infinite amount of values that will collide into a single hash

And in the comments of this answer, there exists a comment of @poncho stating:-

Technically, the input space for the SHA-1 and SHA-2 hash functions is not "infinite"; SHA-1 and SHA-256 inputs are limited to $2^{64}−1$ bits; SHA-512 is limited to $2^{128}−1$ bits

In reply to the above comment, the answer'er does accept the above fact. But, I can't seem to understand why?

As there exists a large number of answers, like this who loosely use the term infinite with the input space of hash functions. I tried to resolve the ambiguity via Wikipedia page of MD5 and SHA1, this is the definition of both:-

In cryptography, SHA-1 is a cryptographic hash function which takes an input and produces a 160-bit hash

The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value

In both the definition, there exists no mention of input size (in bits) of the algorithms.

QUESTION:- Does hashing algorithms have an upper bound in the input space?

P.S.:- If the upper bound does exists, then can anyone tell me why isn't the hashing algorithm able to work on input size above the upper bound?

Vasu Deo.S
  • 469
  • 5
  • 16

2 Answers2

20

Does hashing algorithms have an upper bound in the input space?

They can, but they don't have to and it depends on their specification.

  • All Merkle-Damgård based hash functions do have an upper limit, because appending the message length simplifies the security proof and the backdoor-resistance of the function and they usually use a fixed-length encoding of the length.
  • Though it appears that MD5 (Spec RFC) kindof just ignores this and indeed accepts message of infinite length. The encoded length in this case are the low 64 bits (so it's reduced $\bmod 2^{64}$; Thanks to forest for making me actually check).
  • SHA-3 (Spec PDF) does not have such a limit as it's based on the sponge-construction.
  • Skein (Spec PDF) does have a limit of $2^{96}$ bytes.
  • Blake2b (Spec PDF) has a limit of $2^{128}$ bytes.
SEJPM
  • 46,697
  • 9
  • 103
  • 214
3

It depends on how exactly the hash function is defined.

Typically the first step of a hashing function is to convert the input from a series of bits or bytes to a series of blocks using a padding scheme. Many such padding schemes include the length of the input in the padding as it makes the security arguments easier. Typically the length is encoded as a fixed-size integer.

So the question becomes what happens when the range of that fixed size integer is exceeded. There are two main possibilities.

  • The integer is allowed to wrap around (this seems to be the case for MD5)
  • The hash function is only defined for messages shorter than a threshold (this seems to be the case for SHA1 and SHA2)

However you are likely to find this is a case where practical implementations of the algorithm don't enforce the specification. For example the first result for "sha1 reference implementation" I found on on google ( https://github.com/clibs/sha1/blob/master/sha1.c ) did not enforce any limit on input size.

Peter Green
  • 1,613
  • 1
  • 11
  • 17