17

[Moderator note: this question now lives there]

So, I'm preparing a talk about the well known fact that humans are bad at the task of generating uniformly random sequences of numbers when asked to do so, which is a huge flaw for simple cryptographic systems.

I would like to spice the talk a bit by present some real cases where perhaps some tax fraud or bad science was revealed by a simple frequency analysis of the compromised data. For example a case were a scientist might have displaced some data points to better fit a specific conclusion and that it was discovered by analyzing the end digits of the values he manipulated. Or perhaps some person that changes a few small numbers here and there while working for a bank to get some money flowing to his account and then was discovered by another simple analysis.

In short, I would like to know easy to explain examples of people been caught in some fraud because they though that they were able to emulate random numbers by themselves.

I don't know if this is the correct place to ask for this kind of question but Physics and Law stacks seemed less related.

fgrieu
  • 149,326
  • 13
  • 324
  • 622
Swike
  • 278
  • 2
  • 8

5 Answers5

13

Following my comment, and even though it's a bit different from what you ask: I really enjoyed the story here where the use of an incorrect pseudorandom number generator led to the arrest of members of a "russian espionnage network". Disclaimer: I have no idea how truthful all of this is, I'm leaving you the task of checking how serious the book "compromised" is (that's the book this blog post is based upon, I did not read it).

Roughly, a very stupid PRG was apparently used to create "filler transmission", in order to hide when an actual transmission was happening. The trouble is, this broken PRG was producing decimal digits from 0 to 8 - i.e., you never had any 9! This observation allowed the FBI to distinguish transmissions from fillers, which opened the door to a traffic analysis. Eventually, they identified that communications were happening periodically for a fixed duration, always at the same time slot, and correlated that with the schedule of suspects (e.g. observing that they were never out of their home when a communication was happening), which led to their arrest.

There is also a lot of documents on the VENONA project, which was dedicated to decrypting encrypted transmissions where a one-time pad was reused, you'll find plenty resources on this one online.

Geoffroy Couteau
  • 21,719
  • 2
  • 55
  • 78
12

Actually, bank and expenses fraud is identified for the very opposite reason.

Many of human societal numbers begin with a small digit, with number 1 appearing as the leading significant digit about 30% of the time. It would occur approximately one third as frequently in a random situation (11.1%).

This phenomenon is called Benford's law and produces the following distribution for the first three digits of many human made numbers:-

enter image description here

So I guess that in the generalized case, humans are poor at not just generating uniformly distributed random numbers, but poor at generating any specific distribution.


I haven't done this, but it would be interesting to manually generate a sequence according to a Normal distribution and then test it for normality.

Paul Uszak
  • 15,905
  • 2
  • 32
  • 83
12

Since you mentioned you're also interested in crypto examples, here's one from probably the most famous cipher machine in history: Enigma.

Most of Enigma's key settings were distributed in a keysheet and changed daily but there was also a per-message key that had to be randomly chosen by the operator, corresponding to the initial rotor positions: Enigma machine (three white squares in the middle of this picture, you can just about make out the settings VGT - these actual positions changed after every letter but that's not relevant here)

To transmit these the sender must create six random letters and send the first three (Grundstellung) in plain, then the next three (Spruchschlüssel) encrypted using the settings of the Grundstellung. The Spruchschlüssel would become the actual initial settings and wasn't particularly useful without the daily key settings but could be used as a first step in cracking. It was therefore very important that it was not predictable (like AAA or your initials), not related to the Grundstellung (the same or two halfs of a 6 letter word) and neither part was reused between messages.

Obviously all three of these things happened a lot and while it wasn't the biggest flaw in the system it certainly helped cracking some messages faster.

More details: https://www.bbvaopenmind.com/en/technology/innovation/the-human-errors-that-defeated-enigma/

Robert Williams
  • 229
  • 1
  • 2
9

Allan Franklin wrote about this in "The Mendel-Fisher Controversy." As practical tests of results against theoretical distributions became better, Sir R. A. Fisher noted that Mendel's results were "too good" according to the expected variation that would be noted in a real experiment. At the time Mendel did his work, not as many goodness-of-fit tests were available and the idea of fitting "too well" wasn't so claer.

ttw
  • 191
  • 1
  • 4
2

Not necessarily discovering fraud but causing it: people who played online casino games in the early days used to predict slot and poker cards by knowing the random number generator logic being used.

edwinc
  • 121
  • 1