Can I increase strength of ECB by decorrelating input data with compression

Question

I know that ECB block cipher mode is the weakest method of encryption because repetitions in input data with a stride of a block size lead to repetitions in output data.

However, as I may consider, elimination of repetitiveness should give a stronger result like for other modes (CBC, etc.). In addition, I know that compression algorithms are just designed specially for searching and removing certain types of correlations.

Let's consider input data are compressed using DEFLATE algorithm via zlib 1.2.11 with the highest compression level possible (9). How does such an approach increases the strength of ECB?

Update: My goal is to increase encryption speed by parallelization. I receive UTF-8 texts already deflated as specified in the previous paragraph, encrypt it with a user-supplied key, and put it to cache storage in a user folder.

Squeamish Ossifrage · Answer 1 · 2019-10-06T21:30:18.950

Not only does compressing the ciphertext not make ECB secure, but it actually makes a secure cipher like AES-GCM insecure by leaking the content of the plaintext through the message lengths as in the CRIME and BREACH exploits.

If you are tempted to ‘increase the strength of ECB’, consider stepping back from minutiae like ECB for a moment to review your broader goal. There's a whole literature out there of ways to build authenticated ciphers—which are the black boxes that take keys and messages and turn them into ciphertexts that keep your conversations secret and detect forgery—with various performance characteristics. If you want to keep up with it, you might follow the IACR ePrint archive.

Forget, for a moment, that the concept of ‘block ciphers’ exists. Focus on authenticated ciphers: if you insist on block ciphers, you'll rule out some of the most popular and highest-performing authenticated ciphers like ChaCha/Poly1305, which, as it happens, can exhibit an essentially arbitrary degree of parallelism.

Next, is parallelism your goal, per se, or is performance your goal—in terms of latency and throughput? Let's take a look at the standard cryptography benchmarks which use a consistent framework for performing fair measurements across a variety of machines. It's a high-dimensional space that's difficult to navigate, which is why there's a large literature out there.

For example, anything based on AES will have a wide gulf between (a) hardware implementations that are fast and secure, (b) software implementations that are slow and insecure, and (c) software implementations that are painfully slow and secure. (The scale of slowness may not be relevant to your application; what is more significant is that secure software AES implementations like BearSSL's are few and far between.)

This is also why there are only a few choices that have been widely implemented in software—notably AES-GCM and NaCl crypto_secretbox_xsalsa20poly1305 or variants like ChaCha/Poly1305. The CAESAR competition didn't really turn up anything much better for most users, and there's an ongoing Lightweight Cryptography competition if you want to follow an academic bloodbath of destroying security of novel ideas.

If you are writing software, you should just take one of the handful of secure authenticated ciphers that are ready on the shelf, like AES-GCM or NaCl crypto_secretbox_xsalsa20poly1305, according to engineering constraints, and pay attention to the security contracts. Considerations that might figure into this choice:

Is one of these readily available in your software environment, and will that make the difference of whether you use cryptography or expose users to harm? If so, do that!
Are you subject to auditors who insist that you follow US federal government standards and will look for AES-GCM? If not, consider safer options like crypto_secretbox_xsalsa20poly1305.
Can you guarantee that you use AES-NI and CLMUL hardware support, or not? If not, consider safer options like crypto_secretbox_xsalsa20poly1305.
Can you choose nonces sequentially, as in a sequential conversation, or not? If not, consider safer options like crypto_secretbox_xsalsa20poly1305 with random nonces—or maybe a deterministic authenticated cipher, which can't conceal the fact of message repetitions but otherwise survives nonce reuse.

score 2 · Answer 2 · answered Apr 14 '19 at 12:12

One problem with this approach is that both most compression algorithms and most block ciphers in ECB mode are deterministic. In other words, if you encrypt the same text twice, even if it has been compressed before, this fact will be obvious from the ciphertext.

I'd recommend you to have a look at counter modes, in particular AES-GCM (Galois/Counter Mode). They allow you to perform encryption in parallel without suffering from the problem outlined above (provided you use different IVs for each plaintext, as you should).

forest · Answer 3 · 2021-11-15T23:38:28.300

No, this is insecure.

Most compression algorithms work in blocks. Generally, each compression block is entirely deterministic and stateless. This is why you usually don't need to load all the data you wish to compress into memory and can instead stream it into the algorithm. For DEFLATE, this block size is commonly 32 KiB. Any duplicate 32 KiB chunks of uncompressed data will compress to identical compressed chunks. If you then encrypt that data with a block cipher in ECB mode, you will be able to determine which 32 KiB blocks are identical.

Meir Maor · Answer 4 · 2019-10-07T03:34:31.283

Yes, it dramatically increases security.

No, it is by no means sufficient to be considered secure.

And most inportantly you have better options.

Compressing before encryption is in general good for security, almost all attacks are harder when the plain text is compressed. This is true also for weak ciphers ot weak modes like ECB. After encryption you will still be able to identify duplicate messages or even duplicate prefixes but compressed blocks are not likely to repeat unlike original message which under raw ECB will trivially reveal repeating blocks.

You have better options, if the goal is parallelism you can use CTR mode, or GCM if you want authentication as well.

score -1 · Answer 5 · edited Oct 05 '19 at 11:54

Stop trying to use AES or any other crypto primitive. What the community of security and crypto implementation engineers have learned since the 90s is that if the developer has to type A-E-S into their own code, there is too high a chance it will be insecure.

You are allowed to use Wireguard, TLS, SSH, Libsodium and Google Tink. That's it. Build your solution as something that uses one or more of those and absolutely no cryptography outside one of those.

(The above is the policy in AppAmaGooSoftBook, who hire smart engineers and then watch them make terrible mistakes, so their security teams have invented misuse resistant APIs to lower the chance of someone deploying something insecure. Just supplying a nonce into a secure AEAD is fraught with danger).

Can I increase strength of ECB by decorrelating input data with compression

5 Answers5