8

What I understand is: When we parse a message into 512 bit message blocks. Then we extend the first message block to 64 entry array and start with the compression function.

What happens if the message is longer than 512 bits? As a result, we would receive several message blocks, but where are these fed into the function.

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240
Lizz4rd merge me
  • 83
  • 1
  • 1
  • 3

2 Answers2

10

If the input message is longer than 512 bits, the input is chopped in “chunks” (read: pieces) with fitting length (512 bits) and those are successively fed to the hash compression function.

See, in layman’s terms, SHA-256 processes things like this:

  1. init SHA-256
  2. while there are input chunks,
    update hash compression function with the next input chunk (pad to 512 bits if needed)
  3. finalize SHA-256 and return the result (which is the hash output)

As mentioned in “2”, when SHA-256 reaches a chunk which is less than 512 bits, it is padded accordingly… so that all input chunks fed to the hash compression function are exactly 512 bits long. For details on the padding itself, see “How is input message for SHA-2 padded?” and “SHA256: Padding a 512 bits length message”.

A note aside: The maximum message size which can handled by SHA-256 is $2^{64}-1$ bits, which means SHA-256’s compression function can handle up to $\lceil(2^{64}+64)/512\rceil$ of 512-bit input chunks. In the (unlikely) case when you try to feed more input than the maximum, the hash function should refuse output (read: fail with error).

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240
9

Your question essentially is how message length is dealt with in Merkle-Damgård constructions. However, your description is flawed so let's clarify some things:

The whole point of using this kind of construction is to build a hash function that maps arbitrary-length inputs to fixed-length outputs, given a compression function (that is hopefully collision-resistant, among other things).

First the input message is split into $n$ blocks of equal size - following your example, let's assume this block size is 512-bit (as in MD5, and the smaller SHA-2 algorithms). Then, the message is "padded" (extended) until the total message length

$l_{msg} \equiv 448$ mod $512$

Lastly, a 64-bit extension to the last block is added which is essentially a representation of the initial message length. Now all $n$ blocks are exactly 512-bits. Subsequently, the compression function is applied in a loop over all blocks, where in each round it takes as input the current block and the result from the previous round. After $n$ iterations, the final hash value is produced.

What happens if the message is longer than 512 bits?

As soon as your message length exceeds 448 bits, a second block is needed. So a longer message only increases the number of blocks, and thus loop iterations required.

indiscreteLog
  • 800
  • 1
  • 9
  • 14