1

So I needed symmetric encryption for my program. I landed on AES 192 bits in the CTR mode, because of some Computerphile videos on YouTube.

After using it with the Node.js "crypto" lib implementation, I noticed that some of the outputs are very similar. The output is created from a UTF-8 SQL syntax string input and digested to base64. Based on what know so far, this makes sense, since a lot of SQL strings would start the with the same text, e.g. "SELECT ...", and AES works with independent blocks of data. I also append a random integer to the end of each of these inputs, but cannot at the start (because of the specific situation).

Is it a problem, that the start of a crypt can be easily guessed? If so, is there a way to scramble the output, such that it can be unscrambled later with the same key and IV?

Are there alternative algorithms or modes that do this kind of thing? I need the output to be unintelligible and unalterable.

Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323

1 Answers1

4

Is it a problem, that the start of a crypt ciphertext can be easily guessed?

That happens by design for excellent encryption systems, e.g. because every ciphertext starts with a version and key identifier. But in the case at hand, that's the symptom of a devastating error: AES-CTR is being used for different records with the same constant IV, therefore the cipher degenerates to XOR with a constant bitstring, which is very poor encryption.

AES-CTR mode is designed to be used as follows:

  • At encryption of each cryptogram, it's chosen a fresh IV, usually 8-bytes, by some process than makes it very unlikely that the same IV will be chosen again for a given key. An incremental counter might do, if there's no way it can be reset†.
  • That IV is put as the first bytes of the ciphertext. These bytes are used at decryption to get the IV.
  • That IV is extended to 128-bit, typically internally to the implementation of the CTR-mode cipher.

I need the output to be unintelligible and unalterable.

Then do not use AES-CTR. It aims only at confidentiality of the data, not integrity, which typically is also an operational requirement, and one we read in "unalterable". For this we have authenticated encryption, e.g. AES-GCM, and variants of that which make nonce (aka IV) reuse a lesser disaster, e.g. AES-GCM-SIV.

Caution: defining the operational requirement of cryptography in database applications is hard. For example, when encrypting the answer to a secret question used for user authentication purpose, authenticated encryption of that data in isolation is not enough (because it still allows substituting the unknown answer with a known one). One solution to that is to enter the identification of the cell encrypted as GCM Additional Authenticated Data.


† It's often difficult to keep track of which IVs have been used. One strategy then is to generate the IVs at random: probability of two identical $b$-bit IVs after $n$ are drawn is no more than $n(n-1)/2^{b+1}$ if a working true random number generator is used. Up to $n$ in millions, that's fine for the usual $b=64$. Above that, an option is to use $b=80$ or $b=96$, noting that no more than $2^{132-b}$ bytes should be encrypted with the same IV.

fgrieu
  • 149,326
  • 13
  • 324
  • 622