89

Assume I want to design a protocol (or data format or similar) including some cryptographic hash, and want it to be as future-proof as possible, i.e. I want to avoid that breakthroughs in cryptography make my old data insecure.

If I use only one hash algorithm, I can have bad luck to pick just the one which will be broken after some years.

For example, MD4 (published 1990) had first collision attacks in 1995, and as of 2007 these are even cheaper than calculating the hash itself.

So, the idea would be to combine multiple such algorithms in a way that breaking just one (or some) of them does not compromise the security of the combined construct - breaking all of them would be necessary.

I do not care about efficiency loss from calculating multiple hashes instead of one, I just want to be sure against better cryptanalysis.

So, if I have hash functions $H_1$, $H_2$, ..., $H_n$. How to combine them to form a hash function which is as secure as any of them?

Two basic ideas, which do not work as intended:

  • simple concatenate the outputs: $H_1(m) || H_2(m) || ... || H_n(m)$.

    This should be quite secure against collision attacks, but is only as secure to preimage attacks as the weakest of them (and the other ones then can be used to check if I found the right preimage).

  • chaining the functions: $H_1(H_2(...(H_n(m)...))$

    For preimage attacks, you would now need to break all of them. But a collision in Hn trivially leads to a collision in the result, too. (Collisions in the others are less easy to exploit, as you then need to have to get preimages in the previous ones.)

Any better ways of combining them? Or is this a stupid idea at all?


To clarify: I do not really need more security than the hardest of the component functions, but I want at least that much. (And using more space is not really a problem here.)

And for preimages, I'm now mainly concerned about someone finding the original preimage (which is around the size of the hashes output), and it looks like this will be at most as difficult as for the weakest hash, in the case of simple concatenation.
I'm not so much concerned about someone constructing a new one - this sounds quite difficult, but is not so much, if the size of the message can grow as needed (one block per bit of hash or such) and we have collision attacks on the compression function, as Joux shows.

perror
  • 605
  • 2
  • 10
  • 29
Paŭlo Ebermann
  • 22,946
  • 7
  • 82
  • 119

8 Answers8

61

Combining is what SSL/TLS does with MD5 and SHA-1, in its definition of its internal "PRF" (which is actually a Key Derivation Function). For a given hash function, TLS defines a KDF which relies on HMAC which relies on the hash function. Then the KDF is invoked twice, once with MD5 and once with SHA-1, and the results are XORed together. The idea was to resist cryptanalytic breaks in either MD5 or SHA-1. Note that XORing the outputs of two hash functions relies on subtle assumptions. For instance, if I define $\mathrm{SHB-}256(m) = \mathrm{SHA-}256(m) \oplus C$, for a fixed constant $C$, then SHB-256 is as good a hash function as SHA-256; but the XOR of both always yields $C$, which is not good at all for hashing purposes. Hence, the construction in TLS in not really sanctioned by the authority of science (it just happens not to have been broken). TLS-1.2 does not use that combination anymore; it relies on the KDF with a single, configurable hash function, often SHA-256 (which is, in 2011, a smart choice).

As @PulpSpy points out, concatenation is not a good generic way of building hash functions. This was published by Joux in 2004 and then generalized by Hoch and Shamir in 2006, for a large class of construction involving iterations and concatenations. But mind the fine print: this is not really about surviving weaknesses in hash functions, but about getting your money worth. Namely, if you take a hash function with a 128-bit output and another with a 160-bit output, and concatenate the results, then collision resistance will be no worse than the strongest of the two; what Joux showed is that it will not be much better either. With $128+160 = 288$ bits of output, you could aim at $2^{144}$ resistance, but Joux's result implies that you will not go beyond about $2^{87}$.

So the question becomes: is there a way, if possible an efficient way, to combine two hash functions such that the result is as collision-resistant as the strongest of the two, but without incurring the output enlargement of concatenation ? In 2006, Boneh and Boyen have published a result which simply states that the answer is no, subject to the condition of evaluating each hash function only once.

Edit: Pietrzak lifted the latter condition in 2007 (i.e. invoking each hash function several times does not help).

Morrolan
  • 1,176
  • 8
  • 19
Thomas Pornin
  • 88,324
  • 16
  • 246
  • 315
32

You were right with your ideas in the the original question. If what you want to protect against is pre-images then chaining hash functions produces a function at least as strong as the strongest of its two components:

$$H_{\circ}(x) = H_0(H_1(x))$$

If what you want to protect against is collisions, then concatenation is at least as strong as the strongest of its two components:

$$H_{|}(x) = H_0(x) | H_1(x)$$

There are several other properties that you might want from your combined function, such as pseudorandomness. For pseudorandomness, you could combine two hash functions like this:

$$H_{\oplus}(x) = H_0(x) \oplus H_1(x)$$

The tricky part (as you observed) is if you want to have more than one of these properties. The best research about this that I'm aware of so far is Anja Lehmann's dissertation. (You can find discussion of this and related topics on the "One Hundred Year Cryptography" wiki at the Tahoe-LAFS project.)

If I needed more than one property from a secure hash function, and didn't mind extra CPU cycles, and didn't mind double the output size, then I would probably use Lehmann's $Comb_{4P}$ construction and not worry too much about the rather remote possibility that the resulting combined function may not preserve pre-image resistance.

If you're sure that you only need one property (careful here—think very carefully about this and write down explicitly what property or properties you rely on, and what an attacker can do if each possible property doesn't hold), then you can safely use one of the combiners above.

By the way, that dissertation also includes very interesting results on two other topics that have been discussed in this thread: whether you can have a combined function $C(H_1, H_2)$ that is stronger at collision-resistance than the strength of $H_1$ plus the strength of $H_2$ (she answers in the affirmative) and whether the way that SSL and TLS combined SHA1 and MD5 was secure (answer: sort of...).

wythagoras
  • 207
  • 1
  • 6
Zooko
  • 2,212
  • 18
  • 13
10

I'm sure @Thomas will give a thorough answer. In the interm, I'll just point out that the collision resistance of your first construction, $H_1(m)||H_2(M)$ is surprisingly not that much better than just $H_1(M)$. See section 4 of this paper:

Multicollisions in iterated hash functions. Application to cascaded constructions.

wythagoras
  • 207
  • 1
  • 6
PulpSpy
  • 8,767
  • 2
  • 31
  • 46
6

Well, I see two clean ways of having practical resistance to these vulnerabilities.

If you want to use two hash functions, make sure you feed back the original data in a HMAC to the second function:

hash = algo1(data)
hash = hmac(algo2, data, hash)

The benefit here is that any collisions for algo1 will not automatically become collisions for algo2 due to the MAC. So for a collision attack to work, the attacker would have to find a collision for both functions using the same source data. This should in practice be significantly more difficult than attacking either function independently (it will be at least as difficult as attacking the strongest of the two functions).

The other method would be to simply iterate a single hash function (with feedback). This looks similar to the previous algorithm.

hash = hmac(algo1, pack('N', 1), data)
for (i = 0 ... n) 
    hash ^= hmac(algo1, hash, data)

Where n is greater than or equal to 0. The larger it is, the slower it will be to compute. Note that this is basically just PBKDF2 with an empty salt and the length parameter set to the output size of the hash.

The benefit to "stretching" is that it protects against both preimage and collision attacks since even if the attacker was able to find a preimage for the first round, that still leaves them multiple rounds to attack. And considering the feedback, the data necessary to attack a specific round is destroyed on the next one. So even if the attacker did manage to get one round of a preimage attack done, it would be very difficult (if not impossible) to attack the other rounds.

ircmaxell
  • 425
  • 3
  • 8
1

How about something like interleaving the bits of the base hash function outputs to generate a key (taking one bit from each hash in sequence, skipping hashes that have no unincorporated bits remaining), then generating a HMAC using each base hash function with however many bits of key they can use:

$$K(x) = interleaveBits(H_0(x), H_1(x), H_2(x))$$

$$H(x) = HMAC_{H_0}(K(x), x) \| HMAC_{H_1}(K(x), x) \| HMAC_{H_2}(K(x), x)$$

This way, each HMAC theoretically depends on all of the hash algorithms. Instead of taking the first bits of K for each key, you might want to use the first bits of $K(x)$ for the key for $HMAC_{H_0}$, then the bits following that for the key of $HMAC_{H_1}$, and so on, concatenating $K(x)$ with itself to get enough bits to key all HMACs.

You could also iterate this procedure, and use $H(H(x))$ (etc) as your hash, perhaps using the base hash functions in a different order for each invocation of $H$ (and corresponding invocation of $K(x)$).

Extrarius
  • 131
  • 4
0

There is a research paper explaining why chain hashes do not provide better security in the sense of indifferentiability from random oracles

CodesInChaos
  • 25,121
  • 2
  • 90
  • 129
curious
  • 6,280
  • 6
  • 34
  • 48
0

Use a modified duplex construction. First run the message through $h_1$, then take the final 32-bits of the result, XOR it onto the first 32 bits of the, say 512 bit IV. Then put the IV into a 512-bit hash function, $h_2$. Then take the first 32 bits of the hash function and put it into the hash buffer. Repeat this process until enough hash has been put into the buffer. I know, the speed is $n(h+i)$ where $h$ is the speed of $h_1$, $i$ is the speed of $h_2$, and $n$ is the number of rounds. But if $h_1$ and $h_2$ are really fast, then that's good news!

rath
  • 2,598
  • 3
  • 27
  • 40
0

A combined hash function is just a hash function, and will have it's own theoretical and practical vulnerabilities.

My assertion would be that it's possible to design two hash functions $H_1$ and $H_2$ such that $H_1$ is good, $H_2$ is good, but $H_1 + H_2$ is very weak.

So by combining two hash functions, you may achieve the opposite effect. Whether you do or not has to be established independently.

wythagoras
  • 207
  • 1
  • 6
ddyer
  • 509
  • 3
  • 5