29

Say a server wants to hash a password $p$. It would use a secure hash function $H$ and a unique salt $s$ to hash the password as $H(p,s)$. If one has access to the salt, each password candidate requires one run of the hash function to be ruled out; the same amount of time it would take for the server to verify a password candidate.

If, on the other hand, the password was hashed as $H'(p,s+r)$, where $r$ is a random integer in the range $[0,n]$ and that is not stored anywhere. Verifying a correct password would require on average $n/2$ runs of $H'$, whereas ruling out an incorrect password candidate would require $n$ runs of $H'$, i.e. twice as slow for the attacker than for the server.

I'm thinking that this means that if $H'$ is $n$ times faster than $H$, this yields up to twice as fast password validation for the server but equally slow password testing for an attacker as before. And one could then use this either to make the sign-in quicker for users or the hash function slower for an attacker.

Are there any obvious issues with this approach or ways to improve it?

Edit:
The idea here is for there to be many different options for what is stored in the database and that more of these options need to be tried for an attacker than for the server. This could also be implemented in ways that don't use it as a salt. E.g. password hash $=H_r(p,s)$, where $H_r$ denotes the number of iterations of the hash, i.e. $r=3\rightarrow H_r=H'(H'(H'(p,s)))$.

An attacker would need to calculate $n$ hashes for each password but the server only needs to calculate $r$ hashes to validate a correct password (and $n$ hashes to reject an incorrect password).

n-l-i
  • 1,084
  • 5
  • 15

4 Answers4

24

That's is an interesting idea (that was new to me) and turns out to be known as (random) peppering, as pointed in these comments.

Indeed, the average number of evaluations of $H'$ by the server when testing the correct password (which is usually the most common, except for Denial of Service attack) is reduced by $\frac{n-1}{2n}$ compared to work for an attacker that proceeds by eliminating passwords one by one. That gain converges quickly to $\frac12$ when $n$ grows. And while attackers are not bound to that test strategy†, it is typically the less costly for them, because moving from one password tested to the next typically has a large cost.

A gain of at most 50% is not game-changing. But implementers of password hashes like Argon2 optimize their code carefully (because that makes it possible to use more iterations at a given cost, thus ultimately makes using their code safer), and even a 20% gain should be something they consider notable.

If this is in use in some modern login system(s), I want link(s) to see how the potential issue I note next is addressed. And if this is not used, I wonder why because once you see this relatively easy and sizable gain, it looks like it should be a standard technique.


The one major caveat I see is that we should fear that an adversary finds (or approximates) $r$ by a side channel attack on the genuine server while a genuine user logs in (the simplest such attack being timing the login; since $H'$ is slow, that's relatively easy). With the exact $r$, the attacker's effort is divided by a factor of $n$, which is bad.

For this reason, it's probably best not to make $n$ too big, so that even a leak of $r$ is not a major disaster; and that the genuine server, when verifying a password, tries the $n$ possible values of $r$ in some random order. With small prime $n$ (e.g. $n=7$ saving on average >42% of the work for correct password)

  • draw ephemeral secret $r$ random in $[0,n)$
  • set $\delta=1$, or better‡ draw ephemeral secret $\delta$ in $[1,n)$
  • $r_0\gets r$
  • repeat
    • test if $H'(p,s+r)$ verifies, if so accept password $p$.

    • $r\gets r-\delta$

    • if $r<0$ then $r\gets r+n$, in constant time

      In modern C, if r is of type int32_t, we can use r += (int32_t)((0-((uint32_t)r>>31))&(uint32_t)n)

  • until $r=r_0$
  • reject password $p$.

† For example, an adversary having captured so many login/salt/hashes that several of them are likely to be for password 123456, satisfied by finding one, and able to move from one login/salt/hash to the next at negligible cost, would not be impaired by the proposed change if they tested values of $r$ incrementally for all the login/salt/hashes they have, and the fixed password 123456.

‡ From the perspective of timing attacks, using fixed $\delta=1$ is fine. Random secret $\delta$ is only intended as an extra protection against other side-channel attacks. We make $n$ prime so that any $\delta\in[1,n)$ is coprime with $n$.

fgrieu
  • 149,326
  • 13
  • 324
  • 622
5

First of all, I'll assume that you'd use a password hash (or a Password Based Key Derivation Function, PBKDF) to hash the password. Password hashes require a work factor or number of iterations to calculate the result, slowing down both the server (once, if the password is correct) and the adversary (once for each try). Let's assume that the number of iterations is $n$ in the following sections.

The approach you describe does seem to work. There are however a few drawbacks:

  • the worst case for the number of tries is still $n$;
  • a user that types the wrong password will also experience the worse case scenario.

It seems that the approach simply relies on a full search and that always takes $n \over 2$ operations on average. That means that the speed for the server compared to the adversary is only doubled. This also means that $r$ might as well simply be set to $2$ otherwise the difference between the minimum and maximum time may be too large, without providing any benefit to the user.

So yes, there is and you've described a scheme that is faster on the server. But it is limited in the advantages and it does come with some drawbacks. For such a scheme to be successful I do think that it needs to provide a higher advantage than 2 times. In cryptography this is adding a single bit to the security strength, normally a PBKDF should at least provide ~20 bits of security (i.e. over a million iterations).

Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323
5

It's worth considering the attacker's goals and the threat model of users and the server.

For a given password, guessing only $k$ out of $n$ values of $r$ can create false negatives: a case where you have the right password but you don't realize it because you didn't pick the right $r$.

From a usability point of view, you want the false negative rate to be $0$: a user with the correct password should never be prevented from logging in.

From an attackers point of view, the false negative rate could be almost anything less than $1$: if I breach the password server and get the database of hashed passwords, I probably only care about compromising some number of users, not any specific users. Thus, if I only try $k$ different values of $r$, then I miss a fraction of $\frac{n-k}{n}$ passwords that I would otherwise find, but if I still get $\frac{k}{n}$, I probably don't care (whatever malicious behaviour I wanted to do will still go through).

(there are also different threat models where an attacker would also want a false negative rate of $0$, and your idea could be useful for those).

But, I disagree with the premise that this generally forces an attacker to use an extra factor of $n$ for each password guess.

Sam Jaques
  • 1,808
  • 9
  • 13
0

A hash function is usually expensive because you are doing many hashing rounds. You could do a million rounds for letting someone in, but somehow be able to find out after 100 rounds that the password is definitely wrong - and then wait the appropriate time to report the failure, at zero cost. That could help against DOS attacks.

gnasher729
  • 1,350
  • 7
  • 9