6

I am building a computer program that deciphers Caesar, Vigenere and monoalphabetic substitution ciphers. All of those are susceptible to frequency analysis. However, it does not seem to be real-world applicable considering the complexity of the modern-day ciphers. Where else can frequency analysis be applied in the modern day to make my project real-world applicable?

The only suggestion I have been given is an application to the Bitcoin encryption algorithm. I am not sure if this would be a viable option. Is it?

kelalaka
  • 49,797
  • 12
  • 123
  • 211
Marvin
  • 165
  • 3
  • 10

3 Answers3

6

Classical ciphers operate on letters. If we consider the frequency attack on classical ciphers it considers the frequency of the letters. Modern ciphers, if we consider only block ciphers, operates on blocks - 64, 128, or more bit blocks. Let see how you can perform a frequency attack on modern ciphers to infer some data.

Example attack on Databases:

Let assume that a sensor network has deployed in a rural area to observe animals by researchers. When an animal's tag is detected, the nodes stores this information on a database table; here for simplicity only the type of the animal and date and time encrypted in ECB mode;

|  TYPE  | D&Time| 
|-----------------
| 0x443  | 0x7FE | 
| 0x122  | 0x1E0 |
| 0x443  | 0x651 | 
| 0x443  | 0x6AA | 
| 0x084  | 0x09C | 
| 0x112  | 0xC9F | 
| 0x443  | 0x18D | 
| 0x112  | 0x76B | 

Now, assume that the poachers cannot extract the encryption key from the nodes but they can read the information. What can they learn from this data?

In ECB mode $E_k(x) = E_k(y)$ iff $x=y$ and this can leak information. This mode is used for equality and count queries on encrypted databases, see the CryptDB paper by Popa et.al. An attacker can calualte the frequency of the ciphertext and infer information from them.

By the Kerckhoffs's principle, we assume that they know everything but the key. The poachers also know the region, so they know how frequently a deer or a wolf appears around this sensor node. Assuming that mostly deer than wolf then bears appears in this region.

If we look at the ciphertext frequencies the 0x443 appears most. So they deduce it actually represents a deer. And similarly 0x122 is a bear data.

Long story short result: Frequency analysis can break deterministic encryption. If you have pre-knowledge on the frequency of the data you can infer information without decryption. So a ninth-century attack of Arab philosopher and mathematician al-Kindi still lives.

The attack Naveed et.al is performed using electronic medical records. Some of the data is also publicly available like the frequency of the illnesses, drug sales, etc. So their experimental result shows that an alarming amount of sensitive information can be recovered

Mitigation: Don't use ECB mode. Then, you cannot have simple equality queries on the encrypted data. The CryptDB designed to execute queries of TPC-C ot measure the capability and performance.

The future may reside using equality under the FHE, however, it may not easily solve all of the problems.

Some further readings on modern frequency analysis;

kelalaka
  • 49,797
  • 12
  • 123
  • 211
2

This answer applies to cryptographic algorithms in general, rather than specific cases where the plaintext data must have specific properties. For such a situation, see kelalaka's answer.

Frequency analysis is not a new attack, and as such, the encryption functions in use today are designed to resist frequency analysis.

Having said this, there are two aspects to a cryptographic algorithm's security: The encryption function (such as AES, DES, etc) and the Cipher Mode. This link gives a good explanation of the Cipher Modes.

The various cipher modes have different properties. The Electronic Code Book (ECB) cipher basically breaks the message into a series of blocks, and encrypts each block individually. So no block affects any other block's encryption. This means that although a secure encryption function may be used, this will only remove patterns in each of the blocks of the message. However, patterns which occur between multiple blocks will still be visible, as seen in the penguin images at the referenced Wikipedia article. This means that a sophisticated form of frequency analysis could be used to identify patterns between the blocks of an ECB-encrypted message. However, this is unlikely to retrieve the entire message.

ECB is safe to use for messages whose length is less than one message block, but is vulnerable when used for messages which are of greater length than a single block.

Marvin
  • 165
  • 3
  • 10
waitaria
  • 86
  • 4
2

The Caesar cipher, Vigenere, monoalphabetic substitution, the autokey cipher, columnar transposition, the Playfair cipher, the Rail Fence cipher, disrupted transposition, the ADFGVX cipher, Quagmire III, etc., are all interesting and good to understand, but compared to modern cryptographic systems they are almost always utterly worthless for providing real-world confidentiality, and confidentiality is just one cryptographic service.

However, the program that you are building does have a real-world application that has interest and value: the frequency analysis of classical ciphers. Other such programs already exist, but perhaps you can make one that is better.

It is difficult to imagine a scenario in which one would want to use a classical cipher for a serious purpose (let's omit the one-time pad for a moment). One's level of assurance would be low, even employing the VIC cipher, which is a tougher nut to crack (because it can resist frequency analysis), even against an inefficient opponent. It is just not realistic. Again, the exception here is the one-time pad, sometimes called Vernam cipher.

The one-time pad has very limited applications in the world today, but those applications do exist and they are important. You could write your program with the one-time pad in mind. Again, this territory has already been gone over.

The problem you face is this: modern ciphers are made to be resistant to frequency analysis. If a modern cipher can be attacked in this way, like 3DES in ECB mode, it is proof of tremendous weakness.

As far as Bitcoin goes, there have been problems in Android's Java SecureRandom with reused R values, but I am not sure how this could be helped by standard frequency analysis, which seems most useful as a tool of classical cryptanalysis.

Patriot
  • 3,162
  • 3
  • 20
  • 66