16

The purpose of PBKDF2 is to create a derived key (DK) from a master password (PW) and a salt, often using a function like HMAC-SHA256. I have read that the salt should be as random as possible. But if that's the case, why does it play such a minor role?

I'll let the equations speak for themselves, starting with HMAC-SHA256:

$$\def\op{\operatorname}\def\0#1#2#3{\text{0×#2#3}} \op{HMAC-SHA256}( PW, m ) = \op{SHA256}( K_o ∥ \op{SHA256}( K_i ∥ m ) ) $$ where $$\begin{align*} K &= \begin{cases} PW &\textrm{if } \op{bytes}( PW ) = 64 \newline \op{SHA256}( PW ) ∥ [\0x00]_{32} &\textrm{if } \op{bytes}( PW ) > 64 \quad \textrm{ (32 zero bytes)} \newline PW ∥ [\0x00]_{64-\op{bytes}(PW)} & \textrm{if } \op{bytes}( PW ) < 64 \end{cases} \newline K_i &= K ⊕ [\0x36]_{64} \newline K_o &= K ⊕ [\0x5c]_{64} \newline \end{align*}$$

Now PBKDF2:

$$ \begin{align*} DK &= T_1 ∥ T_2 ∥ ... ∥ T_{\op{len}(DK)/\op{len}(T_i)} \newline T_i &= U_1^i ⊕ U_2^i ⊕ ... ⊕ U^i_{\text{iterations}} \newline U^i_1 &= \op{HMAC-SHA256}( PW, \op{salt} ∥ \op{INT32}( i ) ) \newline U^i_n &= \op{HMAC-SHA256}( PW, U^i_{n-1} ) \end{align*} $$

But watch what happens if the two are combined:

$$\begin{align*} U^i_1 &= \op{SHA256}( K_o ∥ \op{SHA256}( K_i ∥ \op{salt} ∥ \op{INT32}( i ) ) ) \newline U^i_2 &= \op{SHA256}( K_o ∥ \op{SHA256}( K_i ∥ \op{SHA256}( K_o ∥ \op{SHA256}( K_i ∥ \op{salt} ∥ \op{INT32}( i ) ) ) ) ) \newline & \dots \end{align*}$$

Or in pseudocode:

$$\begin{align*} &T[i] = 0 \newline &U = \op{salt} ∥ \op{INT32}(i) \newline &\text{for } n = 1 \text{ to }c: \newline &\qquad U = \op{SHA256}( K_i ∥ U ) \newline &\qquad U = \op{SHA256}( K_o ∥ U ) \newline &\qquad T[i] = T[i] ⊕ U \newline &\text{next }n \newline \end{align*}$$

So the salt is used only once per output block, while the master password is used twice per iteration (in the forms $K_o$ and $K_i$). I understand that the avalanche effect means a slight change in the initial hash input will have a significant impact on the hash output, but wouldn't it make sense to introduce as much entropy into the final result as possible?

ericball
  • 413
  • 4
  • 10

2 Answers2

7

Summary: I don't know of any good reason why it has to be this way. In practice, I don't think it is necessary to inject the password into every iteration. As far as I know, I think the construction would still be secure (in practice) if you used the salt and password only in the input to the first iteration, and then just repeatedly hashed the result many times without feeding in the password any further.

So why does PBKDF2 do it, if it is not strictly necessary for security? I don't know. I can only speculate -- and you should take my speculation with a large grain of salt. That said, my speculation is that, by using the key in every iteration, you can get some slightly better provable-security results.

Technical speculation: it has to do with the security proof. In particular, with PBKDF2, it is possible to prove a statement along the lines of: if HMAC-SHA256 is a secure PRF, and if the password is a full-entropy full-strength crypto key, then PBKDF2(password, salt) is a secure cryptographic key. No such statement would available if every iteration after the first used a bare hash instead of a HMAC PRF; you'd need to make stronger assumptions, such as work in the random oracle model (i.e., assume that the hash is perfect in every regard).

That said, the difference in provable security feels rather negligible to me. The security claim that you can prove about PBKDF2 (without the random oracle model) is not very exciting. PBKDF2 is primarily used to derive keys from a memorable password that is not full-entropy, and that situation is not covered by the security proof above. So, as far as I can see, I don't know of any compelling reason why PBKDF2 was designed to inject the password into every iteration of the algorithm.

Standard disclaimer: don't roll your own. Please don't take this post as a justification for modifying PBKDF2 and using your modified version of PBKDF2. That would be a pretty risky thing to do. Even if we can't think of any strong reason why PBKDF2 had to be designed the way it did, it's still safer to use the standard primitive. PBKDF2 has been well-vetted, and who knows, maybe there's some problem with the variant that hasn't occurred to me.

D.W.
  • 36,982
  • 13
  • 107
  • 196
1

It's not really PBKDF2 that uses the key twice. PBKDF2 uses HMAC, and HMAC happens to work by using the key twice.

More specifically, PBKDF2 is actually a general construction that uses a cryptographic Pseudo-Random Function (PRF) to do some of the internal work. The PRF used here is keyed HMAC. The HMAC construction itself uses two keys derived separately from the original key. From the PBKDF2 description included in the question, the $K_o$ and $K_i$ values are the outer and inner HMAC keys, respectively.

This would then shift the question to: "Why does HMAC use two keys"? This is addressed in other questions:

Another closely related question with a relevant answer: Hash function in PBKDF2.

B-Con
  • 6,196
  • 1
  • 31
  • 45