30

I want to upgrade the security of some existing databases of users' authentication tokens strictly for the purpose of making sure that if the database is stolen, attackers will not be able to guess any but the weakest passwords in a reasonable period of time. I know there's not much I can do for people who choose "password" for their password, but I think we can protect most people who use words from a bigger dictionary, e.g. "regenerating".

Some of the databases are from the 90's and the existing authentication tokens are stored as

 MD5(salt + password)

and some of the salts are pretty weak. Other databases use somewhat better hashes and salts, but still not strong enough to resist attacks from today's computers. However, we don't want to wait until people login to update their database entries. So my plan is to simply take the output of whatever hashing algorithm the password system is currently using and to feed it into bcrypt and store that result as the new authentication token, e.g.

 bcrypt(MD5(salt + password), new_secure_salt)

Does anyone see a weakness with this procedure? It seems to me that this will not be any worse than using bcrypt(password, new_secure_salt) but I know better than to rely on just myself for the evaluation of a cryptographic protocol.

I wouldn't recommend this procedure for new applications, but this method is the best I can come up with for immediately securing an entire database of insecurely stored passwords in a way that is transparent to users.

Major Major
  • 435
  • 4
  • 7

3 Answers3

15

The overall idea is a sound migration strategy. The nice thing is that security for all users is upgraded in a one-shot operation, rather than at the hypothetic next login of each user. Of course, original salt and new_secure_salt shall be stored, or perhaps for some portion derivable exactly from material keyed-in by the user, e.g. user name lowercased.

Using bcrypt(MD5(salt + password), new_secure_salt) is a huge improvement over MD5(salt + password): the risk of password cracking for a given password entropy is greatly increased by the many iterations (and some use of memory) in bcrypt. Using MD5(salt + password) rather than password discard negligibly little password entropy for typical passwords (who chooses 20 random characters among 68 as password?)

However I would recommend scrypt over bcrypt. Scrypt uses the memory available during legitimate use of the password in order to further increase security; have a look at this table from the paper defining scrypt:

Estimated cost of hardware to crack a password in one year


The single weakness I can think of relates to MD5 collisions on salt. If it was possible for an adversary to choose salt and get the resulting authentication token with the same new_secure_salt a few times, it would become possible to recover the first few characters of the password, following the line of thought in Gaëtan Leurent's Practical key-recovery attack against APOP, an MD5 based challenge-response authentication (in IJACT, 2008). That's likely mostly theoretical in the context, where salt turns out to be non-malleable. To rule out any possibility of that attack, it would be enough that salt be included in new_secure_salt.

The above theoretical attack exploits that it is feasible to find two 63-byte 127-byte salt0 and salt1 such that MD5(salt0 + 'e') = MD5(salt1 + 'e') with collision in the first two 64-byte MD5 blocks. Now assume we could get the authentication tokens for these salt0 and salt1; if the password starts with e, then these tokens match! And the converse is true with near certainty. This allows to construct a test for the first character of the password, and recover it in a moderate number of queries, then proceed with the second, and perhaps third character; after that, finding the collisions seem to become quite difficult.

fgrieu
  • 149,326
  • 13
  • 324
  • 622
2

One potential issue with this strategy revolves around compliance. You might be subject to various regulations (such as Government/DoD regulations) that prohibit unsafe hash functions. You might have a very unpleasant experience trying to explain to the audit team how using MD5 as part of user authentication does not mean "the terrorists win!"... In the best case you'll get a knowledgeable audit team that can separate checkbox security from actual security, but it will be very stressful for you (they don't really care about false positives as much as false negatives). In the worst case, they'll take away your authorization to operate - "just to be safe", of course.

Anonymous
  • 21
  • 1
0

One potential consideration is that combinations of security principles may have unintended vulnerabilities. This is not to say they you cannot combine such approaches, but most combinations have been well studied and then recommended by trusted 3rd parties (academic or governmental or institutional).

While not in the same domain has hash functions, the Meet-in-the–middle attack is one example of unintended consequences when combining security primitives. For example, how much weaker is the hash function if the salt + password is known for any combination?

So what does one do with this information. You need to assess the approach your proposing which has a relatively unknown exposure profile to the one you do know, which you, rightly, identified has known issues. Are you willing to trade the operational simplicity for the uncertainty of a novel approach? As an implementer, you'll be able to assess what works the best for your situation or at least express the tradeoffs both approach bring.