On-the-fly computation of AES Round Keys for decryption?

Question

The usual implementation of AES first computes all the Round Keys sequentially starting from the key, and stores them in RAM for later uses. However, when enciphering a single block with a key that will be used for that purpose only, or when RAM is very sparse, or perhaps in hardware, it is advantageous to use the Round Keys a they are being generated, rather than store them. Quoting the Rijndael submission to NIST:

The key schedule can be implemented without explicit use of the array W[Nb*(Nr+1)]. For implementations where RAM is scarce, the Round Keys can be computed on-the-fly using a buffer of Nk words with almost no computational overhead.

It is said this also works for deciphering:

The key expansion operation that generates W is defined in such a way that we can also start with the last Nk words of Round Key information and roll back to the original Cipher Key. So, calculation ’on-the-fly' of the Round Keys, starting from an “Inverse Cipher Key”, is still possible.

However, the how-to is left as an exercise to the reader. In particular: Can the last Round Key (the first used when deciphering) be computed directly, rather than sequentially?

score 6 · Accepted Answer · edited Apr 13 '17 at 12:48

This self-answer is heavily based on comments by Paŭlo Ebermann.

When performing AES decryption with on-the-fly computation of AES Round Keys, there is no choice beyond running the key schedule forward to the last Round Key (the first used when deciphering). The structure of the key schedule creates enough non-linearity and diffusion at each of the 10 steps that no shortcut is practicable. One step comprises 4 SubBytes transformations, 16 XORs of 8-bit quantities, some rotations of all the 16 bytes of the Round Keys, and the doubling of the byte Rcon in $\operatorname{GF}(2^8)$, in a manner such that what's produced by a byte XOR goes thru SubBytes on the next step, and non-linearly influence all the 16 bytes after 4 steps. Even halving the number of steps to reach the last Round Key would be extremely hairy, to the point of being counterproductive.

There are however two implementation variants:

If about 160 bytes of additional temporary RAM are available, the Round Keys can be stored as they are computed, and re-used during the decryption.
Otherwise (and memory is often tight in a small micro-controller, or unavailable in hardware), each of the 10 steps can easily be reversed. The only remote difficulty is the un-doubling of the Rcon; it can be implemented using a small ROM table, or as Rcon=((0x00-(Rcon&0x01))&0x8D)^(Rcon>>1). As pointed by Craig McQueen in comment, the reversal of the key schedule uses the direct AES SBox.

Hardware implementations typically do 2; both options are justifiable in software.

Craig McQueen · Answer 2 · 2016-06-09T23:04:40.923

I've implemented AES-128 with byte calculations for a small embedded systems, with optional on-the-fly key schedule calculation. See aes-min on github.

The key schedule starting point for decryption must be obtained by running the key schedule calculation forwards, calculating all the rounds of the key schedule, to get to the last round. For a particular key, that decryption key schedule starting point only needs to be done once and saved. After that, the on-the-fly-key-schedule decryption runs the key schedule calculation backwards during each decryption operation.

Reversing the key schedule calculation requires the forwards S-box, so on-the-fly-key-schedule decryption requires both the forwards and inverse S-boxes.

fgrieu · Answer 3 · 2021-05-22T20:06:04.057

The simplest way to perform the AES key expansion for decryption is to do it just as for encryption, but storing the subkeys as they are generated, rather than using them. Then use them for decryption, just in the reverse of the order they where produced.

Another option, often used in hardware (much less useful in software), and relatively simple since it's AES-128, is to sequentially compute the last subkey, again just as for encryption; use it; then repeatedly walk back to the previous ones using the fact that the transformation from one subkey to the next is easily reversed. This saves RAM, trading is against computation.

Both methods use that the 11 subkeys (the first of which is the key) are the same for encryption and decryption, only used is the natural order they are produced for encryption, and the reverse order for decryption.

On-the-fly computation of AES Round Keys for decryption?

3 Answers3

Linked

Related