What is the algorithm behind PasswordDeriveBytes?

Question

Microsoft has created an implementation of PBKDF1 in the PasswordDeriveBytes class. It can however generate more bytes than PBKDF1, which is limited to the number of bytes generated by the underlying hash function. How is this proprietary extension of Microsoft defined?

score 10 · Accepted Answer · edited Oct 07 '21 at 07:59

The password P in the Microsoft implementation is first encoded using UTF-8. The iteration count c defaults to the value 100, and the hash algorithm Hash to SHA-1.

The output is identical to PBKDF1 up to the maximum output size of PBKDF1, which is the output size of the hash. After that the first-to-last hash is prefixed with a counter (as a string containing a decimal number encoded using ASCII) and hashed again to produce the output. If GetBytes is called multiple times it will just return the next output bytes from the stream.

Note that this description does not describe the broken implementation that Microsoft shipped with earlier Windows operating systems. Those contained a bug which could result in repetition in the output of the GetBytes method if it was called two or more times.

The following is a more formal description of the algorithm. It extends the definition of PBKDF1 in RFC 2898, section 5.1.

PBKDF1_MS (P, S, c, dkLen)
Options:        Hash       underlying hash function
Input:          P          password, an octet string
                   S          salt, an eight-octet string
                   c          iteration count, a positive integer
                   dkLen      intended length in octets of derived key,
                              a positive integer, bounded to 100 times
                              the output size of Hash
Output:         DK         derived key, a dkLen-octet string
Steps:
  1. If dkLen &gt; 100 * Hash output
     &quot;derived key too long&quot; and stop.

  2. Apply the underlying hash function Hash for c iterations to the
     concatenation of the password P and the salt S,
     apply the Microsoft extension by hashing the concatenation of
     an encoded counter and the output of the hash produced for
     iteration c - 1, then extract the first dkLen octets to produce
     a derived key DK:

               T_1 = Hash (P || S) ,
               T_2 = Hash (T_1) ,
               ...
               T_c = Hash (T_{c-1}) ,
               R_1 = T_c ,
               R_2 = Hash (Ctr(1) || T_{c-1}) ,
               ...
               R_n = Hash (Ctr(n-1) || T_{c-1}) ,
               R = R_1 || R_2 || ... || R_n
               DK = R&lt;0..dkLen-1&gt;

  3. Output the derived key DK.

The Ctr function converts the given number to a decimal
representation in ASCII.

Original Mono implementation in C# here. Java port here.

You are strongly encouraged to use the RFC2898DeriveBytes instead, which implements PBKDF2.

If output is required of PBKDF2 that extends beyond the output size of the underlying hash then it is recommended to use a Key Based Key Derivation Function such as HKDF. If you don't have an implementation of that, then simply hash the output and a statically sized identifier to create a derived key SHA-256(master | identifier) and take as many bytes from the left as needed.

What is the algorithm behind PasswordDeriveBytes?

1 Answers1