Proper way of encrypting large files with AES-256-GCM

Question

I would like to encrypt and decrypt files with AES-256-GCM. I need a solution that works with Java, PHP and JavaScript. In practice, users can upload, download files via web browser and/or desktop app and there are files that will be generated by the server itself. (I'm currently using openSSL, but I'm thinking of switching to Sodium). I have a working solution already that works fine using relatively small files (<100MB). However, I am concerned about memory usage for large files (like videos, disk images).

I found a couple of implementations where file chunks are used. (The resources I found were not specifically security-related sites.)

The concept is:

Read 1MB from the original file
Generate an IV
Encrypt the chunk
Save the IV, ciphertext, and tag to the output file
Repeat until the end of the file

The IV is always 12 byte and the tag is always 16 byte so it is not very difficult to implement it.

The solution works, but the real question is, is this solution good from a security standpoint or should something else be used?

score 4 · Answer 1 · answered Jun 25 '20 at 16:34

No, the way that it is described, the solution is insecure.

The chunks themselves are of course secure. That is: as long as you verify the authentication tag and if the IV is indeed unique, i.e. a nonce as required by GCM.

However, an adversary can simply switch around the chunks in the file, including the IV and authentication tag. That way the file can be rearranged. Similarly, whole chunks can be removed without issue. If you use the same key for multiple files, you can replace one file with another, or even move chunks between files.

The way around this is to verify that all the authentication tags are in the correct order, e.g. by using HMAC or even by using them as Additional Authenticated Data (AD or AAD) in a separate call to GCM. Then add the final authentication tag to the start or end of the ciphertext of the chunks.

As GCM is insecure for large amounts of data, I'd use a separate data key for each file. You can generate one using a key based key derivation function (KBKDF such as HKDF) from the master key and a 256 bit random salt (the salt needs to be included with the ciphertext, of course). Then you can use a zero based 12 byte statically sized counter as nonce.

That will still leave the issue of being able to switch files. That could be avoided by including the file path into the calculation of the keys and saving it in the file, but beware that this will break any file that is renamed. This is a pretty tricky issue that will require domain specific solution. Similarly, changes to files are as tricky to handle. Some kind of separate register / third party may be required.

Note that in principle GCM uses CTR mode and can be made an online capable cipher. Such a cipher directly converts plaintext to ciphertext and vice versa. That way you can have, say, 1 GiB chunks while using a buffer size of 1 MiB or less (or even no application specific buffer at all).

Problem is that many implementations of GCM do away with this property. Some do allow incremental "update" calls, but even then the decryption specifically may not be fully online. An example of this is Java where you have calls to update methods, but since the authentication tag is considered part of the ciphertext the online property during decryption is lost (i.e. it has to buffer as many bytes as the authentication tag, because it cannot distinguish between the two until all the bytes are updated).

Proper way of encrypting large files with AES-256-GCM

1 Answers1

Linked