-2

XOR already provides perfect security in theory but it's hard to apply it in practice due to strict requirements. I was thinking about whether simple addition/subtraction in integer format would not be more practical to use while also providing perfect secrecy.

Ex:

  • "9" will be the plaintext and "1" will be the secret key. In ASCII/UTF-8 they both occupy 1 byte of space, so qualified to xor-ing, so that is 8 bits.
  • 9 = 00111001
  • 1 = 00110001

First we would XOR like normally one would do, so:

Encipher: 00111001 ⊕ 00110001 = 00001000

(obviously the output will be junk ciphertext, in this case the "Back Space" key in ASCII)

Decipher: 00001000 ⊕ 00110001 = 00111001

Or we would use addition for encipher and subtraction for decipher in integer format:

Encipher: 9 + 1 = 10

Decipher: 10 - 1 = 9

As you can see it's just as forwardly secure as XOR-ing, the 10 doesn't reveal anything about the 9, since the key and the plaintext can be both a gigantic positive and negative number who's sum is just happens to be 10, so it provides perfect secrecy.

It's also easier to use, you don't have to convert to binary.

But it has 1 special attribute that I think it makes it superior to XOR. Look at the binary value of 10, it's 0011000100110000, it's 2 bytes long.

So using addition/subtraction instead of XOR also hides and obfuscates the length of the plaintext, doesn't it? Because if we use integers, we are not using a bitwise operation anymore.

Instead the key can be any length, nobody can determine whether we have 9+1=10 or 1000000000+(-999999990)=10 or any other infinite number of combinations. It totally obfuscates the size of the data, there is no correlation between plaintext size and ciphertext size.

And since it's not bitwise operation anymore, it doesn't have the "leakage" problem that XOR has, the key length can be any size, and it doesn't even have to be true random (I think, correct me if I am wrong), any pseudo-random and sufficiently complex integer can do it.

So I think this makes it superior to XOR and other bitwise operators.

Why isn't integer addition used in encryption algorithms? What do you think about this encryption method? Is it better and more practical than XOR?

fkraiem
  • 8,242
  • 2
  • 28
  • 38
cryptonoob400
  • 553
  • 1
  • 5
  • 12

3 Answers3

5

I started to write a long answer, but there are quite a few misunderstandings in the question, so I think a list is better suited:

  • Encoding is different from encryption. If you encode something you will be left with as many possible values as when using binary. It doesn't matter if you encode ten as 00001010 or 00110001 00110000;
  • Junk ciphertext: modern ciphertext is binary and - for a symmetric cipher - consists of random looking values. When you try to view it in any format it will look like "junk" but that's just because you're trying to give meaning to the random bits.
  • Generally we want to be able to encrypt any data, regardless if it is text, an image etc. So binary data is the default input format for modern ciphers that run on computers. The problem with addition is that you would have to convert to numbers; computers however work on binary values. Even addition and multiplications are on machine words and are therefore modulo $2^{32}$ or $2^{64}$. It just takes you some work to get used to this.
  • You have to use modular addition (or subtraction) to make addition secure. Otherwise the ciphertext values will be spread like a bell curve, which leaks information about the plaintext, especially when you look at the outer values such as zero or the maximum value within the ciphertext. This is similar to throwing two dice, where 2 and 12 is much less frequent than 7; if you throw 2 then both dice must have value 1.
  • With modular addition the chance of a smaller number in the ciphertext is the same as with XOR when viewed in binary. With addition we humans automatically remove the zero digits from the left. You can do the same thing with the zero's after XOR. The only difference is if the input / output is treated as a number or not - but that's a choice you can make for both situations.
  • Similarly, you can also left-pad small values with zero or one bits before XOR'ing them. That way you can encrypt with a larger key and XOR just like you do with a large number and modular addition.
  • For a true one-time-pad - which is the name of the algorithm that you're trying to accomplish with XOR or modular addition - any value in the key stream needs to be fully random. If it isn't then you're dealing with a stream cipher. Stream ciphers can be computationally secure, but they are generally not perfectly secure.
  • In general we try to keep the ciphertext as compact as possible. If you want to avoid leaking information about the plaintext size then you can first pad the message to a constant size. In that case you will only leak the information that the message is smaller than that size if you use a one-time-pad.

So no, addition is not better than XOR. The advantages you list can be accomplished for XOR as well. Binary plaintext and ciphertext is a feature, not a drawback.

Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323
3

Firstly, if you want the length of the plain text to be obfuscated, apply padding prior to encrypting the plain text.

Secondly, while it is indeed possible to use other binary operations than XOR in an OTP scheme, to use addition over the natural numbers (as opposed to modular addition in a residue group) would also leak information. The Perfect Secrecy of a proper OTP, means that any given cipher text reveals no information about the plain text, save for the length. This is because, for any given cipher text and (optionally padded) plain text pair of the same length, there is always a key that would encrypt the plain text into the cipher text. With your scheme, the cipher text 0 would only be matched by the pair of the plain text 0 and key 0. The cipher text 1 would only be matched by either plain text 0 and key 1, or plain text 1 and key 0, etc. In a Cipher Text Only attack, the attacker would always know that the plain text has to be less than or equal to the cipher text, which reveals a non trivial amount of information.

Henrick Hellström
  • 10,556
  • 1
  • 32
  • 59
2

I have done an experiment in order to find out which method is better. In fact we could extend this logic to not just integer addition/subtraction but also integer multiplication/division and float multiplication/division as their respective encipher/decipher operation.

So we have 4 different methods that can be tested. Integer multiplication is obviously biased because the product of an odd and an even number will always be even, zero bias, and you also can't produce prime numbers. Therefore float multiplication also needs to be tested, which is not that biased.

I have tested all 4 methods, with 10 trials each, with a sample size of 20 million/trial, so in total 200m sample per method, I think that is more than enough for statistical accuracy. Also the computer uses a harware enhanced entropy source so no need to worry about the quality of the PRNG, but it's still a PRNG, worth noting.

My logic was that we need to measure the correlation between the plaintext and the ciphertext. I think 1-1 systems with no key reuse are perfect OTP's, however I tried to test what happens when the plaintext is actually very repetitive (or it could be the key, doesn't matter), predictable and the attacker has access to a big sample to do a correlation attack.

The correlation between the multiple plaintexts and their corresponding ciphertexts is what measures the information leak. I have used a very simple repetitive plaintext, in the integer/float version just adding or subtracting [+1,-1] cyclically and in the binary format just swapping between [0,1] cyclically. Basically every second plaintext will either increase or decrease by 1 or swap between [0,1]. It can't be more predictable than that, so if the ciphertext leaks some of that obvious predictability, then it can't be perfectly secure.

I have measured the Pearson's correlation coefficient along the trials, taken it's absolute value, and averaged that over the 10 trials.

The results are as follows:

enter image description here

As you can see the binary XOR operation has the least correlation between the plaintext and ciphertext, and even in a PRNG system the correlation is remarcably low on such a big sample size. I think the biases arise from the effects that @Maarten Bodewes has described above, but also due to computer probably trunkating a lot and using inaccurate calculations for floating point operations, they are limited to 20 decimals I think, so over a large sample size, that bias adds up. Binary seems to be more clean anyways.

Now one could also do an autocorrelation test on multiple ciphertexts with random keys from a fixed plaintext, to see whether that setup leaks any data, but it would be similar to this. I think this proves well that binary bitwise operations are more efficient. Claude Shannon was right, no need to reinvent the wheel.

However the leakage even through addition is quite small, so in practical cases where manual computation is required, I believe integer addition based ciphers can be more practical and reasonable secure.

cryptonoob400
  • 553
  • 1
  • 5
  • 12