2

As a preface, I have to say that I am a noob in this area. Having said that, I will ask the question.

I have a situation where I need to validate and protect against tampering a handful of large binary files (~1GB each with ~10GB total) distributed on an embedded system (think Android Tablet). These original files will have a need to have a hash which can ensure that the files have not been tampered with.

Now, the challenge is that these files can be updated with a patch periodically downloaded from a webserver. These patch files will also be distributed with a hash which will ensure that the patch is authentic. (The files will be downloaded over a secure connection.)

Finally the patch needs to be applied to the original files on the embedded system which will then result in the need to regenerate the hash for the newly patched files so that they can be subsequently verified in the future.

There are several challenges:

  • Since these are large files running in an embedded system, the 'recommended' algorithms for hashing like SHA256 may prove to be too expensive to compute at runtime on a mid-range embedded system. The back of the napkin calculation says that may take upwards of several minutes to compute. Since this verification must occur every time an application launched to read this data, it must be very fast for 10GBs of data - less than 5 seconds.
  • Since the patch is being applied on the embedded system, the actual hash for the newly patched file must be computed on the embedded system itself.

My idea:

I was thinking that we could simply compute a faster, "weak" hash (like MD5) on the large files first. Then I would make a server request (with mutual auth) to encrypt the MD5 hash with a private key and return an encrypted hash to the embedded system. Then whenever I want to verify the file integrity, I would use the public key to decrypt the encrypted hash and then verify the hash against the actual files.

Any thoughts on this approach? Does my idea work?

Thanks for your thoughts.

McMurrich
  • 21
  • 2

2 Answers2

1

You can make incremental changes that only touch small pieces of the file faster, at a cost in complexity.

Multiple hashes can help with that. Rather than a single hash value, you can amortise the cost of computation by storing multiple hashes. If you hash the large file in pieces, then you can create a single hash that covers the entire file by hashing all the resulting hashes, which should be quick.

Then you only have to calculate the hashes over the chunks that the patch touches, plus a final pass to combine them all.

This effectively splits the big file into many. The main consequences is that if the patch alters the size of chunks, the chunk boundaries have to move too. That means that you have to maintain the chunk boundaries too, so what you store is a list of offsets and hashes. And you'll have to be careful about applying the patch across a boundary. That makes cumulative patches tricky to get right.

As long as the hash reported with the patch (you'll want to authenticate this somehow) has the same understanding of the chunk boundaries, then you are ok. You could include the boundaries with the patch itself.

Don't use md5 if you care about preventing tampering, its collision resistance is miserable. SHA-2 is plenty fast if you don't have to run it over your entire data set all the time.

Martin Thomson
  • 216
  • 2
  • 2
0

Your idea does not seem to provide any security benefit. If an attacker was able to modify the data while preserving the MD5 hash, your encrypted MD5 hash would also stay the same.

One practicable aproach would be to simply use MD5 hashing. To tamper with the data without changing the hash, an attacker has to perform what’s called a second-preimage attack which is much more difficult than a collision attack and to the present day still computationally infeasible for MD5 (see Is MD5 second-preimage resistant when used only on FIXED length messages? ).

Alternatively you could use a modern fast cryptographic hash function such as SHA3-finalists Skein or BLAKE.

wonce
  • 456
  • 3
  • 5