I need to encrypt big files using AES-GCM, potentially 10 GB or more. For memory (RAM) reasons, I need to processs them by blocks (let's say 16 MB), rather than doing encrypt(plaintext) in one pass.
By reading the answers of this security.SE question: Should I chunk or not for AES-GCM file encryption, I have the feeling that I read all and its opposite.
Which one is the correct approach?
Method A: Since "a counter mode converts a block cipher into a stream cipher" (quote from the linked post above), we can do this:
nonce = Random.new().read(16) out.write(nonce) cipher = AES.new(key, AES.MODE_GCM, nonce=nonce) while True: block = f.read(16*1024*1024) if not block: # EOF break out.write(cipher.encrypt(block)) # we encrypt multiple blocks with the same # "cipher" object, especially the same nonce out.write(cipher.digest()) # we compute the auth. tag only once at the endHere we encrypt multiple 16MB blocks with the same "cipher" object, same nonce.
I read some criticisms about this approach in the article AEADs: getting better at symmetric cryptography, paragraph "AEADs with large plaintexts".But on the other hand, I noticed that:
print(cipher.encrypt(b'hello')) # 4cadd813be in hexadecimal print(cipher.encrypt(b'hello')) # d3585e3471, different, fortunately!so it seems ok (like a stream cipher).
Is it true that GCM (counter mode) converts a block cipher into a stream cipher?
Method B: we have to choose a new
nonceandtagfor each 16 MB:while True: block = f.read(16*1024*1024) if not block: # EOF break nonce = Random.new().read(16) cipher = AES.new(key, AES.MODE_GCM, nonce=nonce) out.write(nonce) out.write(cipher.encrypt(block)) # new "cipher" object, new nonce for each 16 MB block out.write(cipher.digest()) # we compute the auth. tag only once at the endDrawback with this method: we have to save to disk
nonce,tagmetadata for each block.This looks like the method detailed in Proper way of encrypting large files with AES-256-GCM. Obviously a malicious user could swap the order of blocks (including their
nonceandtag) and the file would look ok, whereas it's not. So this solution seems not ok, as suggested by this answer.
TL;DR: Isn't it a problem that we use only one nonce in Method A above?
Note: I also read this method which chains and blocks (and tags).
Is there a general consensus/normalization for a good way to work with big files by blocks with AES-GCM?
(For implementations with Python, such as pycryptodome, I'll ask later on SO, but first, I wanted to read about the background).