Theoretical Approaches to crack large files encrypted with AES

Question

I have a large file (> 200 Gb), that I encrypted a while ago with AES-256-CBC. The file itself is a tar which I ran through openssl. I've forgotten the exact password, but have a general idea of what it is.

Brute force is the easiest way to crack this from what I've seen (given the circumstances that I have a general theory of what the passwords might be), but the hitch I've run into is the time its taking me to actually try each combination. I have a script running on a server, which seems to be taking it ~ 15 minutes before spitting out that its wrong.

I can't help but think there has to be a better way to solve this.

Richard Thiessen · Answer 1 · 2023-06-03T13:10:09.977

It sounds like you're decrypting the entire file. That's going to be slow.

The easiest option is to just truncate the encrypted file, try the decryption and check if the result is a valid tar file with file -b decrypted_output. Openssl will complain that the file is truncated but it still writes part of it. Enough for file to recognize anyways. Truncating the encrypted file to 2k gives a decrypt of 1.4K which file seems happy with.

NOTE:CHECK THAT THIS WORKS WITH YOUR VERSION OF OPENSSL ON AN ENCRYPTED TAR TEST FILE FIRST!!!

I'm getting 50ms roughly to decrypt a few KB mind you a few MB doesn't take much longer. Invoking file takes a few more ms but it's pretty tiny.

For other types of files, this approach also works. Entropy statistics could work for files of unknown formats.

Improving efficiency

Doing the key derivation on a GPU would speed things up. OpenSSL doesn't seem to do memory hard key derivation which means GPUs will be more efficient. The downside to that would be needing to re-implement all the crypto vs just truncating and using openssl which require close to zero effort.

There's also a bit of efficiency to be gained by looking for a magic constant value (EG:"ustar\0") https://en.wikipedia.org/wiki/Tar_(computing) but decrypting a few kb isn't a big deal compared to key derivation at all.

fgrieu · Answer 2 · 2023-06-05T05:00:13.627

It's asked a theoretical approach, thus I'll suppose the question is not about using existing tools as in this answer, which would be off-topic; but rather I'll assume custom code for the password recovery. The general sketch is to test candidate passwords, approximately from most to least likely, by

Turning the tested password into an 256-bit AES key.
Testing that key against a portion of the file corresponding to known plaintext.

Notice the large file size is immaterial.

The algorithm for step 1 depends on version of openssl enc used for encryption, and settings used if any. Older versions of openssl enc derive the key using MD5 and EVP_BytesToKey with the iteration count set to 1, which is a criminal mistake from a security standpoint. The hash changed to SHA-256, reportedly at OpenSSL 1.1.0c. And then modern openssl enc can (if option -pbkdf2 or -iter is given) use PBKDF2 algorithm with a default iteration count of 10000 unless otherwise specified by the -iter command line option. Notice that the password derivation is usually salted, in which case the encrypted file starts with 8 bytes 53 61 6c 74 65 64 5f 5f (Salted__in ASCII), followed by the 8 bytes of salt which must be supplied to whatever password-to-key derivation function is used. PBKDF2 is not memory-hard and thus obsolete for new designs aiming at being secure, and PBKDF2-HMAC-SHA-256 with 10000 iterations is giving little protection against GPU, FPGA or ASIC-based password crackers, but is still is considerably less unsafe than with the iteration count set to 1 against a CPU-based attack, due to the less small iteration count. The new and old derivations are before and after this else statement (at time of writing).

In step 2, we need to find known plaintext. In the case of a TAR file we have two options

Every tar file has size multiple of 512 bytes. PKCS#7 padding is used by openssl enc, thus the CBC-encrypted file will have size modulo 512 either of 32 if salt was used, or 16 if not (which we can detect as above); and the last block of the padded plaintext is 16 times the byte 0x10.
Typically, a tar file has 16 times the byte 0x00 at offsets 80…95 (because that's zero-padding for a file name). These bytes will be at offset 96…111 if salt was used or 80…95 if not (which we can detect as above).

That known plaintext is easily tested since CBC is used: we can decipher with AES-256 (the block cipher) and candidate key the 16-byte ciphertext block that corresponds to the known plaintext block, XOR with the previous block, and compare to the known plaintext block. If there's no match, the key was wrong, thus the password was wrong. False positives are so improbable that they can be ruled out.

For other file formats the recognition of a correct key at step 2 could be more difficult. E.g. tar.gz files can happen to have byte size modulo 16 equal to 15, so that the known padding is a single byte at 0x01 and a test based on that has a 6.2% false positive rate. However, like most common file format, they have some fixed or recognizable bytes in the header, thus a reliable test remains possible.

score -1 · Answer 3 · answered Jun 03 '23 at 12:03

If there was a not generally unknown shortcut to doing this it would imply either a new serious weakness in AES or in the implementation of CBC on openssl.

If you knew the IV you could decrypt block by block. To do this you would have had to have supplied the IV yourself. See openssl man page here

CBC by definition requires an IV. However, sometimes it is not passed directly, but derived together with the key from the password (and a randomly generated salt during encryption) using a key derivation function (namely PBKDF2). If you want to specify the key and IV directly, you must use the -K and -iv options.

Is this what you did? If yes, did you record the IV or can you guess it?

Theoretical Approaches to crack large files encrypted with AES

3 Answers3

Improving efficiency

Linked