Is deriving the IV from the password secure?

Question

I came across an encryption scheme to encrypt files with AES-256. You can see the initialization of the decryption routine below:

salt = scrambled_file.read(16)

key_and_iv = OpenSSL::PKCS5.pbkdf2_hmac(password, salt, 50000, 48, OpenSSL::Digest::SHA512.new)
key = key_and_iv.byteslice(0,32)
iv = key_and_iv.byteslice(32,16)

cipher= OpenSSL::Cipher::AES256.new(:CBC)
cipher.decrypt
cipher.key = key
cipher.iv = iv

decrypted_data = cipher.update(scrambled_file.read(...))

It basically takes a password and a 16-bytes random salt and pushes it through PBKDF2 (SHA512). Afterwards, the key is taken from the first 32 bytes and the IV from the 16 bytes following it.

Is it secure to derive the IV from the same hash as the key?

score 15 · Accepted Answer · edited Sep 02 '19 at 17:12

Yes, it is. PBKDF2 derives a DK, a "derived key", which is indistinguishable from random. This is mainly because function within PBKDF2 is HMAC, and HMAC is a PRF. Let's see the definition from Wikipedia:

In cryptography, a pseudorandom function family, abbreviated PRF, is a collection of efficiently-computable functions which emulate a random oracle in the following way: no efficient algorithm can distinguish (with significant advantage) between a function chosen randomly from the PRF family and a random oracle (a function whose outputs are fixed completely at random).

This also means that the bytes of the derived key are independent, even from each other. So that means that as long as separate bytes from the output are used, we can split the DK value into a key and an IV.

Now, fortunately, the scheme provided uses SHA-512 as the underlying hash function. This means that no additional calculations need to be performed if we require 256-bit key material and 128 bit IV material.

If the SHA-256 hash would have been used the PBKDF2 function would require an additional run. Unfortunately, in the scheme above, an attacker would not have to perform such an additional run. This is because the attacker only has to verify the key; it can calculate the IV when it finds the key. So using a smaller hash function requires a lot more operations for the legit user, while not giving any security advantage. This is a bad property for a PBKDF - which unfortunately is present for PBKDF2.

So as long as your key and IV stay below the output size of the hash the scheme above is secure.

If you'd ever required more output than the single hash requires then it is possible to perform additional calculations using a key based key derivation function (KBKDF) such as HKDF. In that case, a good scheme would be:

derivedKey = PBKDF2(password, salt)
key = HKDF(derivedKey, "Key")
iv = HKDF(derivedKey, "IV")

That the scheme is secure doesn't mean that it is optimal. You could think of using a memory hard PBKDF such as scrypt or one of the newer Argon2 variants instead of PBKDF2. You might also have a look at authenticated ciphers such as GCM to add integrity and authenticity to your ciphertext.

Of course, the scheme does depend on the salt being a secure random value. Please make sure this is the case, otherwise, you may end up with a repeating key, IV for identical passwords, destroying security.

A high number of iterations (the "work factor") for PBKDF2 makes it harder to attack relatively weak passwords (and most passwords are relatively weak).

Marco Aurélio da Silva · Answer 2 · 2019-09-02T16:46:23.907

The Input Vector is like the first encryption block of a CBC mode. It must be random to propagate such noise on the rest of encryption blocks. Otherwise, attackers can guess the first encryption block because in many times the header of a data comes first (think in network routing headers). One approach is to generate the IV from a cryptographic secure PRNG, another is your case of deriving the IV from a KDF over the encryption password (the job of a KDF is to turn an input with low entropy/noise - human passwords - into an output with high entropy/noise at the same time that includes hardening on brute-force attacks on such low entropy inputs).

For performance and security issues, I would recommend deriving the IV from a cryptographic PRNG such as Fortuna, while leaving the KDF with the job to compute encryption keys from passwords. The IV is recommended to be a nonce, it must be unique and different for each encryption procedure, otherwise we are without any randomness guarantees on ciphertext, which makes CBC useless.

score 0 · Answer 3 · answered Sep 02 '19 at 21:27

Is it secure? It depends.

The IV has to be unique (for any mode of operation) for a specific key. Plus, CBC adds some more requirements. Does it hold in this case?

Maybe yes, maybe not. We see that the same password generates the same IV. If the password is used just for a single file and we ensure the file is never modified (or that the attacker gets just a single copy, which might be tricky with SSDs with wear leveling), it seems to be a sufficient solution. Note that this highly depends on the user behavior, which might not be a good idea. This property is not expected by end users, I guess. It might look OK to send two documents encrypted by the same password to the same person, but it is not.

If possible, I would prefer adding the IV to the beginning of file.

Also, you don't mention authentication using some MAC or so. This is important if the attacker can modify the data.

Is deriving the IV from the password secure?

3 Answers3