5

Suppose I have a local file and it is encrypted with a symmetric key derived from a strong passphrase, unique salt and $x$ number of PBKDF (e.g. Scrypt) iterations. I am interested in keeping this file a secret for a very long time.

Suppose the user specifies the number of iterations themselves e.g. a random number between the lowest safe number e.g. 1000 to the highest feasible number e.g. 100,000 when they first encrypt the file. The number of iterations would not be hard coded into the program or stored alongside the salt and encrypted data. It would be remembered only by the user and entered as a separate parameter along with the passphrase when it comes time to decrypt the file.

If an attacker were to gain a hold of the storage device to perform an offline attack on the file's encryption, my thoughts are that this makes their job much harder:

  • They still need to try all the possible permutations of each passphrase
  • Each brute force attempt takes ~200ms due to the slow PBKDF
  • They also do not know the exact number of iterations to try either and have to try each passphrase permutation and repeat the process for every iteration count.

The use case is that for day-to-day use the user stores the number of iterations alongside the salt and encrypyted data for ease of use and fast access. But when they are travelling and need to cross country borders where there is a possibility of their phone/laptop/device being confiscated, then they can choose to not store the number of iterations as an additional precaution.

The difficulty for the user is remembering the number of iterations to use as well as the passphrase. However this is just a small additional number to remember e.g. similar to a 4-5 digit PIN number. For some examples it could easily be written down on a random piece of paper, or added as a phone number to an entry in your address book. In other words, very easy to hide in plain sight and not likely confiscated along with the digital devices.

Another advantage is if the encryption uses a cascade of PBKDFs to produce a final derived key e.g. 7328 iterations of PBKDF2-SHA2 then feed the result of that into Scrypt with 4755 iterations. Now the user remembers the two numbers (or stores them in the address book) but it has a multiplier effect on the cracking difficulty for the attacker.

  • Do you concur that this is a reasonable feature to have in an encryption program?
  • Can you quantify how much additional security is gained from keeping the number of iterations secret from an attacker as well e.g. $n$ bits?
itercryp
  • 51
  • 1

1 Answers1

4

I don't think it is a good idea, for two main reasons.

Firstly, you are basing your security on the obscurity of a parameter that was not designed initially for being secret, which is a risky practice. It is similar to hiding the salt.

Secondly, following your example, you may in principle think that a random number of iterations between 10 and 100,000 is providing you with (roughly) 17 bits of additional entropy. That is, in addition to your initial secret password (with $K$ bits of entropy), you have to store the number of iterations, so your final secret is $K+17$ bits long. However, you are not actually gaining that additional level of security, since the iterative process for key derivation is, indeed, iterative, and for each password attempt, the partial results of each iteration can be reused for the next one.

The cost of a brute-force attack against a standard key derivation method will roughly be $2^p\cdot C \cdot N$, where $p$ is the bit length of the initial password, $C$ is the cost of each iteration (typically a hash-based function), and $N$ is the default number of iterations (e.g., 1000 iterations).

The cost of a brute-force attack against your approach will roughly be $2^p\cdot C \cdot 2^{17} = 2^{p+17}\cdot C$, since for each password you will perform $2^{17}$ iterations in the worst case. That is, your approach is, in the worst case for the attacker, equivalent to the standard one with the "highest feasible number of iterations" (as you put it in the question). To see this, consider the following example of brute-force attack to your approach: the attacker is trying different possible passwords (first "aaaaa", next "aaaab", etc.). In your approach, the attacker doesn't know the number of iterations. Let's assume she is trying the i-th password, for instance, "aabhjk". It is clear that the attacker cannot reuse the work for previous passwords. The attacker starts brute-forcing "aabhjk" with the minimum number of iterations (1000), with no luck. Next, she has to try "aabhjk" for 1001 iterations, but she doesn't start from scratch. She reuses the work done before with the 1000 iterations, and perform an additional one. This strategy is repeated until reaching "the highest feasible number of iterations", as you put it. That is why, at the end, your approach is similar (in terms of a brute-force attack) to simply using a very high number of iterations.

If instead of following your approach, you append those extra bits (i.e., the bits required for storing the number of iterations) to your password, and simply use the standard key derivation method, then the cost of a brute-force attack is $2^{p+17}\cdot C \cdot N$.

That is, your approach is, in fact, weaker than simply using the number of iterations as a suffix to the password. From a quantitative perspective, it could be approximately $N$ times weaker, where $N$ is the default number of iterations.

cygnusv
  • 5,072
  • 1
  • 23
  • 48