If I encrypt a file of a known format that has a lengthy header (e.g. an XML Excel file), does that render the encrypted file susceptible to a "known plain text" attack? In other words, if the first X bytes of the decrypted text are known, does that weaken the encryption for the remaining data?
3 Answers
If a popular encryption scheme is being used: No.
The typical solution is that symmetric stream/block ciphers generate a constant stream of new pseudo-random bits which are merged/XOR'd with the plaintext to produce the ciphertext.
The pseudo-random stream is seeded indirectly by the private key - so as long as the previous or future bits of the PRNG can not be infered from known bits (or infer the PRNG seed for that matter), the cipher is protected from classical known-plaintext attacks.
- 740
- 3
- 8
No proper modern encryption algorithm allows known-plaintext attacks. Even if the adversary knows all of the plaintext except one bit, and knows the ciphertext, that doesn't help him determine the value of that one unknown bit. If you use a cryptographic library then you're unlikely to use an algorithm that is broken in this sense, except ECB. Just make sure that you're using the algorithm properly — if a mode requires a random IV, use a random IV. AES, DES or 3DES with CBC, CTR, GCM, EAX, or any number of other popular modes are safe in this regard.
There is one caveat. The adversary generally knows the length of the plaintext, because it is identical or close to the length of the ciphertext. When the “real” plaintext is compressed (such as an Excel file or any other zipped format) and the compressed text is then encrypted, the adversary knows the length of the compressed text. In isolation, this is rarely enough to extract useful information. However, in some scenarios, the adversary is able to submit part of the input for encryption, and the text that encrypted contains some parts provided by the adversary and parts that must remain secret. This can allow the adversary to map the length of the compressed plaintext against his submitted inputs, and thus to determine which inputs most resemble the secret part (because they result in a better compression ratio) and to reconstruct the secret part given enough chosen inputs. This was the basis of the CRIME attack against SSL.
- 20,442
- 4
- 54
- 97
An emphatic yes.
TL;DR: Don't try to do encryption on your own. Use a consumer-friendly solution like GPG for data at rest, or TLS for data in motion.
This is easy to test for yourself. For example, look what happens with AES in ECB mode when the same key is used to encrypt two strings that start with the same data:
echo testtesttesttesta | openssl enc -aes-128-ecb -K "a1a1a1a1" -a
y0Vu2U+d4uThMygPLppuXbrIamvJTfsHNppU8Zje1tc=
$ echo testtesttesttestb | openssl enc -aes-128-ecb -K "a1a1a1a1" -a
y0Vu2U+d4uThMygPLppuXTh3YNl0Mu3ID8W3g67Qv9A=
You can see the first part of the ciphertext is the same for both plaintexts. There are many attacks that involve this type of weakness. For example, an attacker who can inject his own plaintext somewhere into your documents can potentially discover the plaintext for everything that comes after their input.
Thankfully, the solution is simple. Use an encryption mode that requires an initialization vector. Additionally, it is crucial (for other, but equally important reasons) that you use a mode which includes an authentication tag. Modes that satisfy this requirement include GCM, EAX, and CCM. For example:
$ echo testtesttesttesta | openssl enc -aes-128-gcm -K "a1a1a1a1" -iv "11111111" -a
XFxU81mnoiEbGMAM+1jBAfJ8
$ echo testtesttesttestb | openssl enc -aes-128-gcm -K "a1a1a1a1" -iv "bcbcbcbc" -a
L+vvGULFQXt7DtmzFs95qf+v
For these modes, you must use a unique (and for some, a cryptographically random) initialization vector that seeds the ciphertext with initial randomness.
Crypto can be extremely hard to get right yourself, even if you pick a secure, modern cipher. Do yourself a favor and don't try to do it on your own; if you find yourself at the level of choosing algorithms and generating nonces, you're working at too low a level. Use GPG for data at rest, or TLS for data in motion.
- 11,162
- 1
- 39
- 53