4

I've been recently looking into the creation (and theory) of hash functions, however I just can't figure out how to turn a message into something of a fixed length.

At the moment, my theory of a hash function is like so:

  1. Take in the message (e.g. "secret")

  2. Split it into different 'blocks' (e.g. "sec" + "ret")

  3. Mix them up, and interact different blocks depending on what block type they are (perhaps block type would be determined by mathematical calculations on the character values, and then certain blocks would interact with other blocks to merge in a weird way to create another block).

This creates a message which cannot be undone (since at the end, you don't know what the block types are - and even if you do, you don't know how they interacted to get to those blocks) - however it is not of a fixed length.

What would I do to create a fixed length digest?

Paŭlo Ebermann
  • 22,946
  • 7
  • 82
  • 119

3 Answers3

4

Basically, typical hash functions produce a fixed-length output by the following method:

  1. They expand the input to some multiple of their block size by adding padding.

  2. They compress the first two blocks into a single block using a mixing function and repeat until only one block is left.

  3. The use a fixed finalizing function to turn that last block into their output.

As a simple example, consider a rather trivial and insecure hash function for decimal numbers that produces a one decimal digit output. It could work like this:

  1. Add a 9 onto the end of the input. Then add zeroes until the number of digits is even.

  2. If there are more than two digits, replace the first four digits with the two-digit sum of those four digits. Repeat until only two digits remain.

  3. Sum the final two digits to get the output.

This will produce a single decimal digit of output for any decimal input.

For example, take 123.

  1. Add a 9 on the end - 1239. The number of digits are even, so no zeroes appended.

  2. We replace the first four digits with their sum, 15. There are two digits, we're done.

  3. We sum the 1 and 5 to get 6. Our output is 6.

While this function is simple and insecure (because its mixing function is so bad and its block size so small) it illustrates how hash functions produce a fixed-sized output from a variable-sized input.

Paŭlo Ebermann
  • 22,946
  • 7
  • 82
  • 119
David Schwartz
  • 4,739
  • 21
  • 31
3

Create your one for learning, very simple?

a  = b = c = a0h;
for (every byte "byte" from input)
    a = a XOR byte;
    b = b XOR a XOR 55h;
    c = b XOR 94h;
end for

print a, b, c;

The output size here will be always only 3 bytes (assuming a, b, c are bytes), independent of the input size. In each step of the loop consumes one more byte of input, and somehow merges it into the state of the variables, finally outputting these variables.

(This is not a good hash function, don't use it for anything productively.)

Study a real one?

Do some search in google about the algorithms used in MD5, SHA1, RIPEMD-160, etc. You'll easily find some implementations.

Paŭlo Ebermann
  • 22,946
  • 7
  • 82
  • 119
woliveirajr
  • 1,152
  • 13
  • 17
1

A hash function is a function with an arbitrary-length input size (e.g. $\{0,1\}^*$ or some subset), and a fixed length output ($\{0,1\}^n$, with $n$ fixed for the function). (For cryptographic uses, we also want some more properties, like collision-resistance, preimage resistance and second-preimage resistance - for details see the linked Wikipedia article or other questions in the tag.)

How can we get something like this?

The "ideal" hash function, a random oracle, simply spits out a random fixed-size string whenever a new input string comes, and remembers all previously used inputs (returning the then-used output when one repeats). Of course, this is not usable in practice.

Most real-life hash functions work iteratively:

  • Split the input in blocks of fixed size (which might be bigger or smaller than the final output size). (This usually includes padding of the last block to get only full blocks.)
  • Take one block and some internal state, mix them together to get a new internal state. This "mixing" step is what is often called compression function, as it compresses state + block down to only state. (This is an one-way compression, not the thing used for zip and co.)
  • Repeat with the next block.
  • When all blocks are hashed, either directly output the internal state (or a part therof), or do some more (irreversible) calculations to derive the output from it (this is better to avoid some extension attacks).
Paŭlo Ebermann
  • 22,946
  • 7
  • 82
  • 119