How is it possible to parallelize a hashing function to crack an iteratively hashed password?

Question

Suppose I have an algorithm that relies on multiple iterations of a hash function like SHA1 to slow down an attacker trying to bruteforce a hash.

digest[0] = hash(password + salt)
for i in 1..r:
    digest[i] = hash(digest[i-1] + password)

The final digest[r] is then used as the hash. How does this compare to a more standard implementation like PBKDF2 or bcrypt?

This website states:

simply iteratively hashing the hash of the password isn't enough as it can be parallelized in hardware and executed as fast as a normal hash.

Is there any truth to this statement? Since every iteration of the hash function is dependent on the previous, how could this be possible?

score 11 · Accepted Answer · edited Mar 17 '17 at 13:14

The bit about parallelization is not for hashing one password. But the attacker, by definition, has a lot of passwords to try, and these can be computed in parallel, provided that the function maps well to whatever parallel architecture the attacker would like to use. In the case of the multi-SHA-1 method you describe, it maps very well on GPU. Note that the same can be said about PBKDF2, which is why I tend to recommend bcrypt, which is more resilient to GPU.

Although I do not immediately see any obvious problem with your SHA-1* function, it has the generic issue of being homemade and of not having been investigated thoroughly by dozens of cryptographers during several years. Therefore I would not trust it.

score 6 · Answer 2 · answered Dec 04 '12 at 21:25

If anyone's interested, I asked Bruce Schneier the same question, and here's his response:

Seems okay at first glance.

Maybe add a counter to the iterative hash: Xi = H(Xi-1 + S + i)

I also asked the author of the article to back up his statement that "simply iteratively hashing the hash of the password isn't enough as it can be parallelized in hardware and executed as fast as a normal hash", and here's his response:

If the key stretching algorithm is something like

sha1(sha1(....sha1(sha1($password . $salt))..))

Say it's sha1 applied 5000 times.

Then, you can construct a piece of hardware that looks like this:
 input
   |
[SHA1-1]
   |
[SHA1-2]
   |
[SHA1-3]
   |
   .
   .
   .
   |
[SHA1-5000]
   |
  \ /
 output
...which is just 5000 SHA1 circuits in series. Whenever one of them finishes, it passes its output on to the next. So, like an assembly line, it will take 5000 units of time for a single input to go through the entire system, but the total throughput is one input per unit of time.

Of course, no matter what algorithm you use, the cracker could make 5000 chips that compute that algorithm and get the speedup, but I believe it's in some way more expensive (either in chip size, chip complexity, or in time) to do that with an algorithm like PBKDF2, which includes the iteration number and password in every hash computation.

fgrieu · Answer 3 · 2012-10-09T11:32:42.043

The quoted statement is correct except for the "executed as fast as a normal hash" fragment, when:

we compare speed of attack of the proposed construct to that for the non-iterated normal hash with the same hardware; that attack can use parallelism, although the computation of the result for one password cannot;
and digest[] is wide enough (like with significantly more than twice as many bits as in r, or entropy in salt if that's smaller), which is the case in practice with all standard hashes (If that was not the case, there would be the possibility of shortcuts allowing faster computation, or/and amortization of computations to get the result for different salts and the same password).

Still, the construction is not state-of-the-art: the increase of the cost of a parallel implementation of password search is only a multiplication by a factor (at best) r. Compare to scrypt (resp. bcrypt), where a parameterizable amount of RAM memory (resp. a fixed amount) is used continuously while hashing, forcing the adversary to invest not only in fast hardware for the hashes, but also, for each search running in parallel, in that amount of RAM memory and the associated energy; this greatly raise the cost of the attack, when the necessary memory often comes at nearly no cost for legitimate users. In particular, that makes parallelization using standard GPUs very inefficient, when these architectures are a near perfect match for the given algorithm, since all the data manipulated by an instance fits in registers.

Further, in the case of scrypt, if parallelization is available to legitimate users, that can be leveraged to increase in the number of hashes per password processed, again with penalty for the attacker, but seldom any for legitimate users.

score -1 · Answer 4 · answered Oct 09 '12 at 07:34

PBKDF2 salts each hashing stage with a new value and the uses a mix of all stages. It also does several thousands rounds of this hashing. This makes it impossible to build rainbow tables for, while in your case custom rainbow tables can be built if the salt value is known.

How is it possible to parallelize a hashing function to crack an iteratively hashed password?

4 Answers4