3

Let say I have three sequences:

S1 = {1,2,3,4,5,6,7,8,9,10}

S2 = {3,7,1,9,4,10,5,8,6,2}

S3 = {8,3,10,2,6,7,1,5,9,4}

i.e. S2 and S3 are just the permutations of S1. Is there any method to check how much these sequences are random to each other? I need to check their randomness using any randomness test. I found NIST test suite which is for bit-stream but not able to find any test for permuted sequences.

Keeping in mind that the sequence is big {1,2.3,...,n} and have 'm' sequences.

Edit: Here; m < n

And permutations are key-based permutations.

4 Answers4

8

There is no such thing as randomness of a sequence (or of a permutation, or of a string, etc.). There is only randomness of a process for choosing sequences (permutations, strings, etc.), which is intrinsically not something you can test by looking at its outputs.

What you can do is write a decision procedure that will, with some probability, return a different answer depending on which of two different random processes generated the outputs. Of course, the probability distribution on its decisions depend on the probability distribution on its inputs! Here's a couple examples:

    • Input data: String of $n$ bits $b_i \in \{0,1\}$.
    • Procedure:
      1. Compute $$\chi^2 = \frac{(\#\{b_i = 0\} - n/2)^2}{n/2} + \frac{(\#\{b_i = 1\} - n/2)^2}{n/2},$$ where $\#\{b_i = 0\}$ is the number of zero bits and $\#\{b_i = 1\}$ is the number of one bits.
      2. If $\chi^2 \leq 3.841$, return 0; otherwise return 1.
    • Probability distributions:

        • A1: independent uniform Bernoulli trials, $\theta = 1/2$
        • B1: independent Bernoulli trials biased by $\theta = 1/3$

        If the inputs are distributed by A1, then the probability of returning 0 is 95% and the probability of returning 1 is 5%. In statistics jargon, if we take A to be the null hypothesis and 1 to mean rejection or an alarm, the statistical significance or false alarm rate is 5%.

        If the inputs are distributed by B1, then the probability of returning 0 is considerably lower and the probability of returning 1 is considerably higher. (Exercise: Compute these quantities. In statistics jargon, if we take B to be the alternative hypothesis to our null hypothesis A, the probability of returning 1 given B is called the statistical power of the test.)

        • A2: independent uniform Bernoulli trials, $\theta = 1/2$
        • B2: alternating 0 and 1 bits, either 01010101… or 10101010… with equal probability $1/2$

        If the inputs are distributed by A2, then as above the probability of returning 1 is 5%—obviously, this is the same test and the same distribution A1 as above, so the false alarm rate is 5% as above.

        If the inputs are distributed by B2, then the probability of returning 0 is 100% and the probability of returning 1 is 0%. That is, the test has no statistical power whatsoever to detect B2!

    What this illustrates is that the utility of any distinguishing test depends entirely on the distributions you are trying to distinguish. You cannot simply ask: ‘Is this sequence random?’ or ‘Is this sequence independent uniform Bernoulli trials?’

    • Input data: String of eight Smarties in unique colors.
    • Procedure:
      1. If a mauve candy appears before a yellow candy, return 0; otherwise return 1.
    • Probability distributions:

      We hypothesize that a factory, which produces the colors in alphabetical order [blue brown green mauve orange pink red yellow], has not been randomly permuting them with all possible permutations—specifically, it has been permuting the first four as a group, and the last four as a group, but never interchanging anything in the first group with anything in the last group. So we have:

      • A3: Smarties with uniform permutation
      • B3: Smarties with blue/brown/green/mauve permuted uniformly, followed by orange/pink/red/yellow permuted uniformly.

      Under distribution A3, there is a 50% chance of mauve appearing before yellow, and a 50% chance of yellow appearing before mauve. So the procedure returns 0 and 1 with equal probability, 50%.

      Under distribution B3, there is a 100% chance of mauve appearing before yellow, and a 0% chance of yellow appearing before mauve. So the procedure returns 0 with 100% probability, and returns 1 with 0% probability.


In cryptography, we usually work with distributions that are so similar that any procedure to distinguish them has almost the same probability of returning 1 for both distributions—so that even procedures that were designed by extremely smart cryptanalysts with knowledge of how your system works can't tell the distributions apart with more than negligible probability. In other words, if the NIST tests can break your system, then it can be broken by someone so stupid that they don't even know they're trying to break your system—and I mean that quite literally: a programmer at NIST, who is not stupid in the general sense but specifically lacks knowledge of your system, was able to devise a test years ago that would break it before you even designed your system.

So, we can't help you find a test for randomness of your permuted sequences—that is a nonsensical question, on its face. It's also not really interesting to try to psychoanalyze you to guess what processes might have generated your permutations and then derive tests based on those. But if you can describe the candidate processes, or describe a process and compare it to the uniform distribution on permutations, maybe then there is a test that will distinguish them.

For example, maybe you're trying to permute $\{0,1,2,\dots,2^{128} - 1\}$, and your permutation is chosen by (a) picking a 256-bit key $k$ uniformly at random, and then (b) selecting $\operatorname{AES}_k$ as the permutation. In that case, you have no hope of distinguishing it from a uniform random permutation of $\{0,1,2,\dots,2^{128} - 1\}$ (barring side channel attacks).

Squeamish Ossifrage
  • 49,816
  • 3
  • 122
  • 230
-1

Just 10 samples of a sequence, couldn't depict any thing about randomness .In real, testing randomness, need a long sequences. NIST tools are one of the best ways to deducing which a sequence is pseudorandom or not,however they need at least 10^6 sequences and the input should be in bit form.

Arsalan Vahi
  • 346
  • 1
  • 4
  • 14
-1

The solution to generating the permutation is simply a Fisher–Yates shuffle, where Fisher–Yates is fed a source of random numbers from an AES or HKDF based RNG. The issue is just ensuring a bias free random selection from a range which can be accomplished inefficiently with sample rejection, or something cleverer. If the implementation is correct, there is no need to NIST test them as they will be fully random wrt each other. That's the shuffle's raison d'être. If you wanted, you'd just NIST test the RNG itself. And if you use the standard/approved RNG constructions below, then you don't even need that, other than to debug.

So to get randomly permuted sequences $S_i$ keyed on $k_i$, build:-

$$ \text{Original key,}\operatorname{k_i} \to \operatorname{SHA256(k_i)} \to \text{256 bit key,} {k^\prime} \to \operatorname{AES_{k^\prime}(n)} \to \text{random bytes} \to \operatorname{Shuffler(S1)} \to S_i $$

or perhaps better/simpler,

$$ \operatorname{HKDF-SHA256_{k_i}(n)} \to \text{random bytes} \to \operatorname{Shuffler(S1)} \to S_i $$

where $k_i$ are the keys for the individual permutations, and $n$ is just a counter for the AES or HKDF based RNGs.

The generally accepted approach (if you build the above) is to simply confirm the randomness of the RNG driving the shuffling process.


If you do require a 'randomness' test of the permutations wrt themselves, I think that you're going to have to redefine randomness to something specific to your problem. The NIST test suite (nor any other) doesn't apply to permutations or inter-permutation testing. You're going to have to mathematically and explicitly define how the $S_i$ sequences are "random to each other". Otherwise how can you test against a hypothesis? And you'll then have to relate that hypothesis to a range of $p$ values indicating the certainty of the null hypothesis, from 0 to 1. That's required as randomness is pesky and hard to pin down empirically. The $p$ value indicates with what certainty might have that thingie arisen by chance.

Paul Uszak
  • 15,905
  • 2
  • 32
  • 83
-1

Yes, you can check randomness between sets by comparing their position relative to one another. This is called a positional randomness test.

Comparing your two sets to your control set, using a positional method. The set, (3,7,1,9,4,10,5,8,6,2) shows a higher entropy, (better randomness), excludes the fact that you should have at least one repeating number.

Jon Hutton
  • 71
  • 7