Making a cipher non-malleable using a plaintext transform?

Question

NOTE: By malleable/non-malleable, I mean the ability/or not to change a byte/block of the ciphertext and have it change only that byte/block of the plaintext.

I understand that we use authentication (via HMAC/UMAC/etc.) to verify integrity. However, where authentication is not possible (perhaps in a case where one has a very limited environment on storage/CPU power), can we transform the plaintext before encryption?

Using several "rounds" of a move-to-front transform that reverses the plaintext after each iteration does exactly that. Once the plaintext is over 1kb (or so), you only need a few rounds. This creates a sort of avalanche effect' so that tampering in any place affects a large amount of the plaintext.

My MTF transform uses an index of possible bytes (0-255). Each "round" consists of an MTF transform, resetting the index, reversal of the plaintext, and another MTF transform.

Running some tests suggests that with two rounds of my MTF for random 256-byte inputs, flipping any bit will, on average, change over 50% of decoded bits. For a 512-byte input, the average is 53%. For a (barely tested) 1024-byte input, the average (and lowest) number of bits changed is over 50% for any flipped bit.

The outliers (minimum number of bits changed) are linear for small enough inputs. For a 512-byte (or less) input, the lowest number of bits changed is always achieved by flipping the first bit (further right) of the first input byte. Additionally flipping any bits at the end of a small input always results in a greater than 50% change, but not at the beginning. These undesirable effects seem to disappear with a 1024-byte input, although testing this takes me too long (about 2 minutes per input!) to make any useful statement.

The main problem with this approach is that it takes too long. I am using a small notebook with Intel Atom CPU Z3735 @ 1.33GHz. One KB takes around 0.034 seconds to encode. To encrypt a KB using ChaCha takes around half this time.

NOTE: The transform function is completely known (per Kerckhoff's principle). It is completely dependent on the plaintext. The function does absolutely nothing except add non-malleability to the plaintext and thus ciphertext. It works before ANY cipher, and not with.

UPDATE: I've updated the transform function, altering the indexes used for the transform rounds, reducing the rounds to 2, and adding a running sum of data input to the process. It is now quicker than ChaCha and achieves non-malleability at 512 bytes of input, without any bias (within limits of my testing). It adds a nonce on encoding that it removes on decoding. The code is here if anyone wants to have a look.

I'm aware I could make the code faster, but I still think there must be some standard approaches to the overall problem. The only modern, similar thing I've found relates to an all-or-nothing transform, which Rivest discusses here. It seems strange that the idea (assuming Rivest is correct about added security) didn't really catch on.

My questions:

Is there any existing approach that does anything similar to this? Basically some sort of modern version of Russian copulation?
Does anyone know a potentially quicker way to transform plaintext (completely independent of the cipher) and make it (potentially) non-malleable?

Modal Nest · Answer 1 · 2020-12-13T07:12:52.457

Is there any existing approach that does anything similar to this? Basically some sort of modern version of Russian copulation?

There is one existing idea which is currently used in some RSA implementations that could be tweaked slightly to fit the requirements of Russian copulation. I will briefly describe below, and then mention a potential additional benefit of such a 'transform'.

OAEP

Optimal Asymmetric Encryption Padding (OAEP) is a padding scheme often used together with RSA encryption.

The algorithm takes the form of a Feistel network which uses a pair of random oracles $G$ and $H$ to process the plaintext prior to asymmetric encryption. When combined with any secure trapdoor one-way permutation $f$, this processing is proved in the random oracle model to result in a combined scheme which is semantically secure under chosen plaintext attack

As using OAEP as an all-or-nothing transform would require processing in blocks (with padding) and potentially some extra tweaking to design it for the purposes of non-malleability.

Finally, it is not necessarily be the case that using such a transform function would only provide the benefit of non-malleability. As per the Ron Rivest's paper All-or-nothing encryption and the package transform

We present a new mode of encryption for block ciphers, which we call all-or-nothing encryption. This mode has the interesting defining property that one must decrypt the entire ciphertext before one can determine even one message block. This means that brute-force searches against all-or-nothing encryption are slowed down by a factor equal to the number of blocks in the ciphertext. We give a specific way of implementing all-or-nothing encryption using a “package transform≓ as a pre-processing step to an ordinary encryption mode. A package transform followed by ordinary codebook encryption also has the interesting property that it is very efficiently implemented in parallel. All-or-nothing encryption can also provide protection against chosen-plaintext and related-message attacks.

Below I will give an example of how we can use OAEP as inspiration. It can obviously be improved upon (and I could probably use a variant of ChaCha QR). My code is here and may well be easier to read than my explanation below.

This CAN be done in a single-pass, and it is significantly quicker than my previous function which looked up/sliced/rearranged arrays, and processed byte-by-byte.

First, I took an unbalanced Feistel cipher and set it up (with 10 double rounds). It is made up of 8 blocks of 32 bits.

The blocks are XORed with fixed circular rotations of other blocks, and shift position after each double round. In each double round, most blocks are affected in some way by most other blocks. For example, block0 (in one double-round) is affected by Blocks 1,2,4,5 & 7. Odd blocks are affected by one less, eg. Block 1 is affected by Blocks 2,3,4 & 5.

After 10 double rounds, this achieves good (over 50%) diffusion in output blocks 0,3,5 & 6. In the other "bad" blocks it is around 56%.

We will process input data in 128 bit chunks, using the "good" blocks of the Feistel and the "bad" blocks of the Feistel will be the state/capacity which is initialised with random data and saved with the final encoding. Thus allowing us to decode (backwards), and the initial randomness providing a nonce.

Finally, we add a 128 bit "counter" into the Feistel. This keeps a XOR of all the data blocks processed (and begins on 0). This doesn't need to be saved, but does require slightly more computation for decoding as we need to "count" all the blocks before we begin the decoding process.

As we decode in reverse, it is obvious that the last data block affects any of the previous data blocks (due to the saved state). And as the "count" of previous data blocks affects the last data block, we have all bits affecting all bits.

It would require padding the input, but it's incredibly quick. 1KB is processed in <0.003 seconds on my rubbish machine. It's actually too quick for my rubbish testing and often registers 0 at lower amounts.

The problem I can see is that it generates differing diffusion levels depending on the size of input. If the input is 128 bits, the average number of bits changed for any bit flipped is around 75%. For 1024 it is around 57%. For 3072 it is around 53%. For 4096 it is still around 52.5%. At 1KB it is around 52%

I have not found a single bit that could be flipped that didn't cause over 50% of bits to change for any input size.

score -1 · Answer 2 · edited Dec 11 '20 at 11:33

Maybe im not getting something. The avalanche effect, known state, no added cryptographic security, 50% bits flipped if one bit is flipped, simpler and faster than ChaCha20... It seems you are describing a pseudo random number generator, a simple one. Why not to use it? It fits all of your requirements.

See here, (page 18 & 23 in particular, and a bit further), which shows exactly the same requirement in pictures. This is the fastest PRNG algorithm that has good avalanche effect and long period.

About 10 operations per word (32bit, 64bit) that your machine can operate on, unlike proper cryptography that needs dozens or even hundreds operations on a single byte.

Re-seed the PRNG with data on every step to spread the changes further than one block. Use two passes, one forward and one backward, to make every bit affect every other bit, to the left and to the right of itself.

For example, initial seed is 42 (public), first data chunk in plaintext is "D". PCG uses seed 42, makes output 948. Then you make 948 XOR "D" to get first block of ciphertext. Reseed the PCG with the data from last data plaintext block "D" and PCG last state 948, instead of just 42. To decode you repeat the process, as XOR cancels out. Take the 42 as PCG seed, get 948 as output, make 948 XOR ciphertext (948 XOR "D"), get the initial data chunk "D", and now you can make the next seed to decode second symbol, from D and 948. When you will make a backward pass, you will also need a public seed, otherwise you wont know how to decode, because second pass will override the first one. With two public seeds (they can be the same) you can treat each pass as a separate process.

Previous PCG state and data can be mixed with addition, XOR or other functions. Avoid multiplication to not to introduce too many zeros in PCG state. That is, PCG seed equals to PCG old output (+ or XOR) current plaintext chunk.

This page gives some code examples for PCG:

You can use any PRNG for this role, but check the paper that shows why exactly you should not use each of others available PRNGs. In particular slow avalanche time and patterns.

Making a cipher non-malleable using a plaintext transform?

2 Answers2