Is HMAC-MD5 still secure for commitment or other common uses?

Question

MD5 collisions have been out for some time. In spite of this, HMAC-MD5 is still secure for authenticating data¹. This illustrates a strength of the HMAC construction, it does not require that the hash function be (weakly) collision resistant.

Recently, Dan Kaminsky posted a simple method for finding HMAC-MD5 collisions. The steps of the attack are:

Given a hash function with a collision and a key either known or controlled by an attacker, it’s trivially possible to generate a HMAC collision. The slightly less quick and dirty steps are:

Start with a file at least 64 bytes long

Generate a collision that can append to that file.

XOR the first 64 bytes of your file with 0x36’s. Make that your HMAC key.

Concatenate the rest of your file with your colliding blocks. Make those your HMAC messages.

HMAC(key, msg1+anything) will equal HMAC(key, msg2+anything)

HMAC has been suggested² as a way to do cryptographic commitments.

Is HMAC-MD5 (or more in general, HMAC with a non-collision-resistant hash function) still a secure way to do commitments?

This is important because, for example, Python's HMAC library uses MD5 by default.

Bonus Question: Are there any other common uses of HMAC and are they still secure for HMAC-MD5?

^{1. Though not recommended for new applications.}
^{2. If anyone knows of a more authoritative reference for using HMAC for commitment, please edit it in.}

score 11 · Accepted Answer · edited Oct 07 '21 at 06:47

No, message commitment by disclosing its HMAC-MD5 with a key later revealed is no longer any secure, because of the ease with which MD5 collisions can now be found. There's however no compelling evidence that's insecure for messages constrained to belong in a small arbitrary set that no adversary can choose or influence. Still, whatever the constraints on the messages, the narrow 128-bit output width of HMAC-MD5 allows an attack with a relatively modest (and clearly feasible) $2^{65}$ evaluations of HMAC-MD5.

HMAC builds a PRF from a hash function $H$ with Merkle-Damgård structure, message block width $w$ and output width $h$, with $w\ge h$, as $$\operatorname{HMAC}_H(K,m)=\begin{cases} H\Big((K\oplus\text{opad})\mathbin\|H\big((K\oplus\text{ipad})\mathbin\|m\big)\Big) &\text{if $|K|\le w$}\\ \operatorname{HMAC}_H\big(H(K),m\big) &\text{if $|K|>w$} \end{cases}$$ where $\text{opad}$ (resp. $\text{ipad}$) is the 0x5c5c5c… (resp. 0x363636…) pattern with width $w$, and $\oplus$ is bitwise exclusive-OR with the shortest operand right-padded using zero bits.

If the compression function used to build $H$ has suitable properties, then for random unknown $K$ with $|K|\ge h$, the function $m\mapsto \operatorname{HMAC}_H(K,m)$ is indistinguishable from random with effort less than about $2^h$ evaluations of $H$. We have no compelling evidence that this does not hold for $H=\operatorname{MD5}$ (for which $h=128$, $w=512$). For more details see Mihir Bellare, New Proofs for NMAC and HMAC: Security without Collision Resistance, in Journal of Cryptology, 2015 (originally in proceedings of Crypto 2006).

When $H$ is MD5, or any $H$ that is not collision-resistant, the attack in the question renders insecure a commitment protocol where Alice

secretly chooses $m$ and $K$
computes and publishes $\operatorname{HMAC}_H(K,m)$ as a commitment of $m$
performs some action dependent on $m$ (like: offer a bet about the first bit of $m$)
later reveals $m$
reveals $K$, allowing a verifier to compute $\operatorname{HMAC}_H(K,m)$ on the $m$ Alice alleges, and compare against Alice's commitment.

Notice however that even with an ideal $H$, there's an attack with effort about $2^{h/2}$ evaluations of $H$, where Alice finds $m$ and $m'$ with $H\big((K\oplus\text{ipad})\mathbin\|m\big)=H\big((K\oplus\text{ipad})\mathbin\|m'\big)$; thus the mere output size of HMAC-MD5 limits its security level to a modest $2^{64}$ evaluations of MD5, in this protocol where $m$ is unconstrained.

On the other hand, I see no attack (much better than brute force on $K$) on the use of HMAC-MD5 (or HMAC with a non-collision-resistant $H$) in the variant of this protocol where Alice is constrained to choose $m$ in a small arbitrary set that no adversary can choose or influence, like $\{\text{“stone”},\text{“paper”},\text{“scissors”}\}$, as considered in the suggestion referenced in the question. Alice has so little choice on the messages that she must be clever in her choice of $K$, rather than $m$, in order to cause a collision. My intuition is that because $K$ enters twice in the computation of $\operatorname{HMAC}_H(K,m)$, with at least one execution of the compression function in-between (for heavily constrained $m$), finding a theoretical shortcut would be extremely hard, well beyond what the current cryptanalysis status of MD5 allows.

score 1 · Answer 2 · answered May 26 '15 at 03:55

There is absolutely no reason to use HMAC-MD5 in a new product. Don't.

HMAC-MD5 security can still work in many roles, but should not be regarded as "secure" unless you can do extensive analysis as to EXACTLY how it's used. In particular, it is "secure" for document signing only if you are sure that the signer is NOT motivated to break security.

What Kaminsky showed was that an attacker with knowledge of the key can extend the ability to generate collisions in MD5 to generate collisions in HMAC-MD5.

But the "attacker with knowledge of the key" model is a difficult security proposition anyway, and every effort is made in all protocols to prevent it in most cases.

Unfortunately it is exactly the security model involved in code signing in an open-source environment. If the attacker is one of the people signing code, they can take the innocuous code that someone's going to review and some attack code that's going to give them root access whenever the typeahead buffer is full, and twiddle the comments in both files to generate a collision. Then they can check in the innocuous code, and later substitute the attack code without changing the checksum.

Is HMAC-MD5 still secure for commitment or other common uses?

2 Answers2

Linked