Does it matter if I publish only publish good or bad MD5 hashes after recovering from a hack?

Question

(This question is a follow-up from another question I asked about MD5, which is why I'm insisting on using MD5 in this question as well)

My imaginary Linux distribution, Sushi Linux has had a big incident. Hackers gained access to the website and replaced the download link with a link to an infected ISO file. The website was taken down, and a backup was restored, after making sure that this cannot happen again of course.

Still, my logs show that the infected ISO file was downloaded by a lot of people. I want to give them easy instructions to check whether or not they are infected.

Both the infected and the uninfected ISO file have the same size, but they have different MD5 sums. My religion does not recognize any other hashing algorithm than MD5, so that's what I'll use in 2019. I know about the weaknesses but since no other hashing algorithm exists I'll use it anyway.

My users are afraid that they installed Sushi Linux using an infected ISO and they ask me how they can check. The ISO file they can download from my website today is the uninfected one, and I've fixed the problem the hackers used to get in last time, they're not getting in a second time. However, my users want a way of verifying their local file without downloading a new one (most Sushi Linux users are on 56K modems so they don't want to download it again).

I've considered publishing the MD5-sum of the infected ISO, so that users can identify their local ISO as being the infected one, and I've considered publishing the MD5-sum of the correct ISO, so that users can identify their local ISO as being the uninfected one. My webmaster is hourly, so I can only publish one hash. Does it matter which hash I publish for users to be able to check if they have the infected file or the real file?

Squeamish Ossifrage · Accepted Answer · 2019-04-27T13:18:25.983

Does it matter? Yes: there's a qualitative difference in the types of attacks that would break your system. The elephant in the room, of course, is MD5, but let's examine the qualitative difference between the attacks first.

It is almost useless to publish the bad hashes, because as an adversary I could just distribute different versions to everyone. A million downloads? A million different ISO images, uniquely tailored with a special brand of malice.

What about publishing the good hashes? This is what you should do anyway, but you're restricted to MD5. How can I break this?

I'm an evil developer, and I play the long game.

I make three versions of a software package:
- the good one does what it is advertised to do which is something useful
- the bad one does something harmful noisily, like uploading credit card data to a bad place
- the sneaky one does something harmful quietly, like slowly making your screen look blurrier and blurrier over the course of a month, or silently disabling disk encryption
There's a sneaky catch: the good one and the sneaky one collide under MD5, but the bad one does not.
I publish the good package under my name, and the curators closely scrutinize it to audit the code and confirm it does what it claims. (Yeah, right!)
I hire someone, who lives in a closet under the stairs in the Ecuadorian embassy with nothing but his cat and his colossal narcissism to entertain himself, to break into the package servers and upload the bad package on one distribution server, and the sneaky package on all the other ones.
Someone notices the network traffic from credit card uploads and raises an alarm. The curators publish the good package's hash for everyone to verify their systems.
Everyone freaks out and stampedes to upgrade the software simultaneously.

Now everyone has the sneaky software, and if they check the MD5 hash they will rest assured that it's the good package's hash.

You could say this is convoluted. True, it is convoluted: I wouldn't try to pull this plan off, but that's in part because I'm not really evil. Evil people who are dedicated will use a convoluted plan if it works. A single NUL byte buffer overflow can, through a convoluted series of steps, be turned into remote code execution.

So what do you do? In your rush to rectify the poisoned ISO image, you could commission a careful multinational study of the technical capabilities of everyone on the planet who might be your adversary, and determine their risk aversion, technical sophistication, logistical planning, and chutzpah to see whether they would be capable of pulling off any attack in this class.

But that might cost a pretty penny and it might take a bit of time.

Fortunately, there's a much cheaper way to get a high degree of confidence that you thwart any plan of this sort, without having to study how convoluted the adversary's actions might have to be and whether that level of convolution is feasible.

Don't do something stupid like using MD5. Use a modern collision-resistant hash that isn't broken instead, like SHA-256, or SHAKE128, or BLAKE2b.

If you absolutely must use MD5 and MD5 alone, you could publish $(r, \operatorname{MD5}(r \mathbin\| n \mathbin\| m))$ where $r$ is chosen uniformly at random independently for each file (maybe as the HMAC-MD5 of $m$ under a secret key), $n$ is the name of the file including the version (encoded prefix-free, maybe with a length delimiter), and $m$ is the content of the file. (Of course, if the adversary knows or can predict $r$ before they choose $m$, then it's back to the same issue as standard MD5.) But chances are, it'll be much easier for you and your users if you just use a non-broken hash. SHA-256 has been available for a decade and a half.

score 0 · Answer 2 · answered Apr 26 '19 at 13:34

There are issues with MD5 that make this slightly less clear. But, if you ignore those:

Knowing that a given ISO isn't the bad ISO doesn't, as suggested by Squeamish Ossifrage, help too much. You rely on far too many things that may, or may not, be true like self-modification, multiple versions of the bad thing etc. If hashing was perfect (no-collisions) and the ISO is know-good because its hashes match, you don't have that problem. The advantage of the 'known-bad' system (again assuming a perfect hashing system) is that it's very clear smoking gun. There's no way that a file with the 'bad' hash should be there. It's safe to delete and you know you got it. You can even apply this to all ISO's on the system (or over the wire etc). Much like an anti-virus system that often uses a 'know-bad' signature system.

Which depends is best depends on the policy you are using to stop the spread of the bad one. In the sense of checking a file you want to be sure the 'good one' then just checking it's 'not the bad one' is a bad idea. I feel it's worth noting at this point: no one is making you pick; there's no reason you can't do both.

If you drop the 'perfection' from the hashing algorithm the situation's a tad more complex. I.e. whoever did it, could do something sneaky like make the hash of the 'bad' thing match something you shouldn't delete or make it look like the original because the hashes match. However to do this requires your hashing algorithm to be really broken. To my knowledge it's not currently tractable to made an ISO that looks like another fixed target that isn't that one, under MD5. Which you would need to, in order to do those attacks, (or anything else I can think of, but take that with a pinch of salt).

I'll finish by defending your religious observations. In that I'd trust:

That matching a know-good MD5 meant it was good.

Way more than I'd trust:

My hashing implementation isn't lying to me, given I have just installed a malicious package.

Does it matter if I publish only publish good or bad MD5 hashes after recovering from a hack?

2 Answers2

Linked