The logic of preferring PBKDF2 over iterative SHA2

Question

I have studied PBKDF2 for the application of hashing passwords, and frankly I don't understand the justification for the complexity of PBKDF2 over iterative hashing with a cryptographic hash function.

Here is an example: (obviously if you want a longer/shorter hash, choose a longer/shorter digest)

|| - concatenation operator
let password be a user supplied byte sequence
let salt be a random byte sequence supplied by the runtime
let iteration_count be a positive integer in the millions

Algorithm:
iteration = SHA256( salt || password )
for ( 2 to iteration_count )
     iteration = SHA256( iteration )
let hash = final value of iteration
// Store hash, salt and iteration_count

As long as the iterations are tuned to take a target amount of CPU for the security policy, for what purpose justifies using a more complex routine than that?

On my computer with Java, it takes approximately 2 million iterations to fill about 1 second of CPU time to generate the hash.

The biggest advantage to me with this design is, next year when the computer is running faster, I can update my entire user table statically, stretching the iteration count a little further. In fact, if I had a huge operation and performance was going to be a problem, I could run a constant depth extension of the hashes, to wit:

update table USERS set ITERATION_COUNT++, set hash = SHA256( hash )

Besides simply telling me not to roll my own, can some experienced experts please convince me why this security design is poorer than the venerated PBKDF2? Thank you very much in advance. I am eager to learn something new here.

score 8 · Answer 1 · answered Jun 25 '18 at 23:56

I think you are overstating the complexity of PBKDF2, and also, not matching it feature-wise with your alternative. Let's dispatch the latter first: as gammatester's comment mentioned, PBKDF2 supports variable output size. If you built that into your proposal, it would become more complex.

Once we control for that, PBKDF2 is hardly more complex than your iterated SHA-2. The heart of PBKDF2 is the iterated application of a pseudorandom function (PRF), with the password as its secret key:

$$ F(\mathrm{Password}, \mathrm{Salt}, c, i) = U_1 \oplus U_2 \oplus \dots \oplus U_c $$

...where:

$$ \begin{align} U_1 &= PRF(\mathrm{Password}, \mathrm{Salt}\, \|\, \mathrm{INT\_32\_BE}(i)) \\ U_2 &= PRF(\mathrm{Password}, U_1) \\ & \vdots \\ U_c &= PRF(\mathrm{Password}, U_{c-1}) \end{align} $$

The way that the sequence of values $U_1, \dots, U_c$ is computed is very similar to the way you're proposing to iterate SHA-256, except with the addition that each iteration uses the password as its secret key. While this isn't necessarily a groundbreaking improvement, it does mean that an attacker who sees an intermediate $U_i$ is not able to pick up from there and compute the rest of the sequence unless they guess the password. Whereas with your construction, any intermediate value of iteration is sufficient to compute all succeeding ones.

The second difference is that the final value of $F$ is not just $U_c$, but rather the XOR of all the values in the sequence. This Q&A goes into the motivation of that, and the answers offer several alternative takes.

The third difference is the choice of a PRF as the primitive instead of a hash function like you propose. This is just conceptually sensible given that we're talking about a scenario where there is a password that plays the role of a secret key; all that PBKDF2 says is that you should iterate a secret-keyed function, using the password as the secret key, and allows for any such choice of function. The common choice of PRF is HMAC-SHA1, which does make PBKDF2 look a bit complicated if you're missing the observation that it treats its PRF as a black box—it's not concerned at all with HMAC's internal structure.

warchantua · Answer 2 · 2018-07-03T13:52:00.250

Most implementations of PBKDF2 use HMAC−SHA1 as PRF.

As it is pointed on Wikipedia about SHA1 calculation:

One weakness of PBKDF2 is that while its number of iterations can be adjusted to make it take an arbitrarily large amount of computing time, it can be implemented with a small circuit and very little RAM, which makes brute-force attacks using application-specific integrated circuits or graphics processing units relatively cheap.

The same is true for SHA2.

Current Bitcoin hashrate for SHA2-256 is 41,024,492,542 GH/s, for someone with this much processing capabilities it would be relatively easy to do exhaustive search on passwords in your database. Really, attacker will be able to do $2 * 10^{13}$ attempts per second.

If you want to build secure scheme for decades, you need to use something like bcrypt or scrypt. Both these schemes are built to consume a lot of RAM, which makes it harder to implement them on ASICs, especially with strong parameters.

UPD: also look at https://password-hashing.net/ . As a result of Password Hashing Competition, Argon2 was created. It has better properties than scrypt, bcrypt and is recommended password hashing algorithm now.

The logic of preferring PBKDF2 over iterative SHA2

2 Answers2