Encrypt a big file in blocks with AES-GCM: how many nonce do we need?

Question

I need to encrypt big files using AES-GCM, potentially 10 GB or more. For memory (RAM) reasons, I need to processs them by blocks (let's say 16 MB), rather than doing encrypt(plaintext) in one pass.

By reading the answers of this security.SE question: Should I chunk or not for AES-GCM file encryption, I have the feeling that I read all and its opposite.

Which one is the correct approach?

Method A: Since "a counter mode converts a block cipher into a stream cipher" (quote from the linked post above), we can do this:

nonce = Random.new().read(16)
out.write(nonce)
cipher = AES.new(key, AES.MODE_GCM, nonce=nonce)
while True:
    block = f.read(16*1024*1024)
    if not block:  # EOF
        break
    out.write(cipher.encrypt(block))  # we encrypt multiple blocks with the same 
                                      # "cipher" object, especially the same nonce
out.write(cipher.digest())  # we compute the auth. tag only once at the end

Here we encrypt multiple 16MB blocks with the same "cipher" object, same nonce.
I read some criticisms about this approach in the article AEADs: getting better at symmetric cryptography, paragraph "AEADs with large plaintexts".

But on the other hand, I noticed that:

print(cipher.encrypt(b'hello'))  # 4cadd813be in hexadecimal
print(cipher.encrypt(b'hello'))  # d3585e3471, different, fortunately!

so it seems ok (like a stream cipher).

Is it true that GCM (counter mode) converts a block cipher into a stream cipher?

Method B: we have to choose a new nonce and tag for each 16 MB:
```
while True:
    block = f.read(16*1024*1024)
    if not block:  # EOF
        break
    nonce = Random.new().read(16)
    cipher = AES.new(key, AES.MODE_GCM, nonce=nonce)
    out.write(nonce)
    out.write(cipher.encrypt(block))  # new "cipher" object, new nonce for each 16 MB block
    out.write(cipher.digest())  # we compute the auth. tag only once at the end
```
Drawback with this method: we have to save to disk nonce, tag metadata for each block.

This looks like the method detailed in Proper way of encrypting large files with AES-256-GCM. Obviously a malicious user could swap the order of blocks (including their nonce and tag) and the file would look ok, whereas it's not. So this solution seems not ok, as suggested by this answer.

TL;DR: Isn't it a problem that we use only one nonce in Method A above?

Note: I also read this method which chains and blocks (and tags).

Is there a general consensus/normalization for a good way to work with big files by blocks with AES-GCM?

(For implementations with Python, such as pycryptodome, I'll ask later on SO, but first, I wanted to read about the background).

score 6 · Accepted Answer · answered Nov 29 '20 at 18:12

AES-GCM can encrypt up to $2^{39}-256$ bits with a single key+nonce pair. That's just under 64GiB. A 10GiB file is fine.

If you'd go beyond 64GiB you'll lose security. In that case, either use XChaCha20-Poly1305 (max of 256GiB plaintext per message) or divide the file into chunks < 64GiB.

16MB is far smaller than needed, and will hurt performance.

Most implementations of AES-GCM (Or XChaCha20-Poly1305) will provide some sort of streaming interface, with init, update, and finalize functions, where init starts the computation, update takes in some data and can be called repeatedly, and finalize finishes it. Libsodium's crypto_secretstream_* (documentation here) is a good example, but any library offering a streaming implementation should have something similar.

Basj · Answer 2 · 2020-11-30T07:59:29.830

Additional detail to @SAIPeregrinus's answer:

At the end, the "method A" above (by blocks) gives exactly the same result than if we did the whole plaintext in one pass:

import Crypto.Random, Crypto.Cipher.AES  # using package "pycryptodome"
key = bytes.fromhex('7d29ccf69c671775e17d4b9dd6485fd8')
nonce = bytes.fromhex('04972c7927042af0ee10c7e6ac56ddd3')
usual method (whole plaintext in one pass)
cipher = Crypto.Cipher.AES.new(key, Crypto.Cipher.AES.MODE_GCM, nonce=nonce)
print(cipher.encrypt(b'hellohelloblablabla').hex())      # e8eed0bf4e10dd882d2a7d4daf377fa05419a5
method A, by blocks
cipher2 = Crypto.Cipher.AES.new(key, Crypto.Cipher.AES.MODE_GCM, nonce=nonce)
print(cipher2.encrypt(b'hello').hex())                     # e8eed0bf4e
print(cipher2.encrypt(b'hello').hex())                     # 10dd882d2a
print(cipher2.encrypt(b'blablabla').hex())                 # 7d4daf377fa05419a5
gives exactly the same result

so the fact of writing in chunks (to avoid "Out of memory" error if we read 10 GB in one pass) in method A above has no impact on the result encrypted file.

Encrypt a big file in blocks with AES-GCM: how many nonce do we need?

2 Answers2

usual method (whole plaintext in one pass)

method A, by blocks

gives exactly the same result