0

Edit suggested by fgrieu:

I have one hundred integers in $\{0,1,2,3,4,5\}$ which I suspect are consecutive values of $\lfloor x_n/2^{16}\rfloor\bmod6$ computed as $x_{n+1}:=a\cdot x_n+b\bmod m$, with $m=2^{31}$, and $(a,b)\in{(214013,2531011),(22695477,1)}$. How do I validate that hypothesis, find the $(a,b)$ used, and predict what follows in the sequence?


Question about "A competent implementation in a compiled language would take like a second on a modern desktop PC."

I wrote some code but they are expected to run 20 hours.

I am trying to find the random seed. First, I input my data in an array. Since I don't know my first data is what-th number generated by the server, I need to find it out. I only know the server shut down every thursday 2:00pm, and restart around 2:45-3:45pm the same day. When the server is on, ir generates 3 numbers every 45 seconds. The data I have is collected on fri 1:50 am, so my first data maybe the 2400-2640th number of the LCG.

I first run the rand 2399 times to discard the first 2399 numbers. Next, I loop 241 times to find my first data is what-th number generated by the server. (the uncertainity of the server restart time 2:45-3:45pm, 240 numbers per hour)

My method is: If 2400th x's bit 16 equal to bit 0 of $y_1$, then I check 2401th x's bit 16 and bit 0 of $y_2$, and so on. If there is unequal, break the loop then start another loop, compare 2401th x and bit 0 of $y_1$.

What is the better way to do it? I just started to learn c++ two weeks ago, please forgive my ignorance.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <iostream>
#include <inttypes.h>

using namespace std;

const int RESULT[3][7] = { {0,1,0,1,1,1,1}, {1,0,1,0,0,0,0}, {0,1,1,0,1,0,0} };

static unsigned long x;

int test_rand(void) { x = 214013 * x + 2531011; // or is it 22695477*x+1 return (int)((x >> 16) & 0x7FFF); };

int onlyx16(void) { x = 214013 * x + 2531011; // or is it 22695477*x+1 return (x >> 16) & 1; };

void chk_seed(unsigned long seed) { int d1[241]{}; int d2[241]{}; int d3[241]{};

x = seed;

for (int i = 0; i &lt; 2399; i++) {
    test_rand();
}

for (int i = 0; i &lt; 241; i++)
{
    d1[i] = onlyx16();
    d2[i] = onlyx16();
    d3[i] = onlyx16();
};

int correct = 0;

for (int k = 0; k &lt; 236; k++)
{
    correct = 0;
    for (int i = 0; i &lt; 7; i++)
    {
        if ((d1[i + k]) == RESULT[0][i])
        {
            correct += 1;
        }
        else {
            correct = 0;
            break;
        };
        if ((d2[i + k]) == RESULT[1][i])
        {
            correct += 1;
        }
        else {
            correct = 0;
            break;
        };
        if ((d3[i + k]) == RESULT[2][i])
        {
            correct += 1;
        }
        else {
            correct = 0;
            break;
        };
    };
    if (correct == 21)
    {
        printf(&quot;seed 0x%d is OK\n&quot;, seed);
        printf(&quot;results forecast:\n&quot;);
        for (int round = 0; round &lt; 5; round++)
        {
            printf(&quot;round%d &quot;, round + 1);
            int new_d[3]{};
            for (int i = 0; i &lt; 3; i++)
            {
                new_d[i] = test_rand()% 6;
                printf(&quot;%d&quot;, new_d[i]);
            };
            printf(&quot;\n&quot;);
        }
    };
}

};

int main() { for (unsigned long seed = 0; seed < 0x100000000; seed++) chk_seed(seed); };


$x_{n+1} = (a \cdot x_{n} + b) \mod m$

In normal situation, $x_{n+1}$ and $x_n$ are known. But now I only know $x_n\mod 6$ and $x_{n+1}\mod 6$.

I have searched many website on google but I only find one question that talked about this problem.

Predicting values from a Linear Congruential Generator

However, it is not very clear and I still don't know what should I do after reading that. I hope someone can provide some math or example code, so that I can learn from trial and error.

I want to find a,b,m then use a C++ source code I found here to brute-force the seed.

I found an answer here that talked about how to find m, but I don't know $x_{n+1}$ and $x_n$.

https://security.stackexchange.com/questions/4268/cracking-a-linear-congruential-generator

I am new to this topic, but I desperately wanted to crack this PRNG, this PRNG made me suffered a lot, I decided to learn programming because of this PRNG. Thank you for your help!

fgrieu
  • 149,326
  • 13
  • 324
  • 622
fairytale
  • 53
  • 6

1 Answers1

3

Per comments revising the original question: the OP conjectures that 100 digits $y_n$ in $\{0,1,2,3,4,5\}$ in their possession are obtained using a C(++) expression equivalent to rand()%6 with the rand() using a (non-cryptographic) PRNG based on a Linear Congruential Generator, with code equivalent to

static unsigned long x;
int rand(void) {
  x = 214013*x+2531011; // or is it 22695477*x+1
return (int)((x>>16)&0x7FFF);
}

Notice that $x$ is at least 32-bit, but only the lower-order 31 bits matter due to (x>>16)&0x7FFF and C performing unsigned long multiplication with truncation of high-order bits that do not fit in the variable.

Abstracting the programming details, the conjecture is that $x$ evolves per $x_{n+1}:=a\cdot x_n+b\bmod m$ with $m=2^{31}$ and $(a,b)$ for some fixed constants believed to be either $(214013,2531011)$ or $(22695477,1)$. And rand() outputs bits 30…16 of $x$ thus the $y_n:=\lfloor x_n/2^{16}\rfloor\bmod 6$ are given for $n=0$ to $99$ (with the seed an unknown integer $x_{-1}$ in some immaterial range, since only $x_{-1}\bmod m$ would matter; and we are better off trying to find $x_0$ anyway).

The OP asks if it's possible to confirm or infirm that conjecture, and (if true) find which $(a,b)$ are used and predict what follows in the sequence.

Yes, that's possible, with excellent confidence. The effective state of the PRNGs considered has 31-bit, there are only two PRNGs considered, they are passable for simulation purposes, thus we should be able to find their state and which is used with a little more then $31+1=32$ bit of information; and we get $100\cdot\log_2(6)\approx258.5$ bit, which will give confirmation aplenty.

The simplest is to try for both conjectured $(a,b)$, and explore the possible values of $x_0$. There are only $2^{31}$, knowing $y_0$ allows to systematically reduce that by a factor of $6$. Each following $y_i$ further eliminates $5$ candidates out of $6$. If no candidate survives all the $y_i$, the hypothesis is disproved. If a match is found, we know which $(a,b)$ we used, and can compute additional $y_i$. A competent implementation in a compiled language would take like a second on a modern desktop PC.

But maybe want to break the thing in real time with a modern \$0.5 CPU running from a \$0.2 battery, or are stuck to a programmable calculator of the 1970s, or the ENIAC. Remark that $6$ is even, and the divisor is $2^{16}$. It follows $y_n\bmod 2$ is bit $16$ of $x_n$. Also remark that since $m$ is a power of two, the change of a bit in $x_n$ does not propagate to lower-order bits of $x_{n+1}$. If follows that bit 16 of $x_n$ depends only on the low 17 bits of $x_0$. We know bit 16 of $x_0$, thus need to test at most $2^{16}$ candidates for bits 15…0 of $x_0$. We can then find the other 14 bits as above. That divide and conquer approach would allow to tackle much larger parameters, e.g. variable x of type uin64_t and return x>>33, on a modern CPU.

I wonder what we could do if $a$, $b$ and/or $m$ were unknown.


Notes on the above:

  • It uses the dominant convention in computer science (and cryptography with few exceptions like DES): bits are counted from 0 (low-order bit), so that if a non-negative integer $v$ is represented in binary as $k$ bits $b_j$, then $v=\sum b_j$ with $0\le j<k$. In big-endian convention, bit 0 is written on the right: $6\times7=42$ in decimal is $110\times111=101010$ in binary.
  • Computer variable x is 32-bit at least, but only it's low order 31 bits (0 to 30) matter and are used in $x_n$, thus $0\le x_n<m=2^{31}$. The output of rand() is 16-bit at least, but only it's low order 15 bits (0 to 14) matter, and all the others are zero, thus $0\le$rand()$\le$RAND_MAX$=2^{15}-1$. If $0\le i<15$ then bit $j$ of the output of rand() matches bit $j+16$ of x. It follows bit 0 of $y_n$ matches bit 16 of $x_n$.

(Off-topic) comments on the current code:

  • It tries to use the speedup made possible by 6 being even. I maintain this is not required to reach an execution time in seconds, if
    • we explore the possible values of $x_0$, rather than the seed many steps before.
    • we use that $x_0$ is 31-bit, thus we can restrict the outer search to [0, 0x7fffffff] that is $2^{31}$ candidate $x_0$.
    • we use that we know $y_0$, thus that $x_0=2^{16}\cdot(6\cdot u+ y_0)+v$ for $0\le u<\lceil2^{15}/6\rceil$ and $0\le v<2^{16}$, which restricts to about $2^{31}/6$ candidates for $x_0$.
    • we optimize the test for checking candidate $x_0$ against $y_1$ in the inner loop on $v$.
  • The essence of the optional speedup noting 6 is even is to first find bits 16…0 of $x_0$ matching the values $y_0$ gathered, then bits 30…17. The code does not do that! With that speedup, execution time would go down to the millisecond.
  • We need the full values of the $y_n$ gathered (in both the non-optimized search, and the second part of the optimized search), but they do not seem to be available in the input, which I guess is $y_n\bmod2$. Further, I don't understand why that's in two-dimensional array RESULT[3][7].
  • When $x_0$ is found, if it was necessary to find the seed (or rather bits 30…0 of that) a known number of steps before, that can be done efficiently by walking back the LCG using $x_{n-1}:=a^{-1}\,(x_n-b)\bmod m$ where $a^{-1}$ is the modular inverse of $a$ modulo $m$.

Here is working code without the optional speedup (thus capable of working with odd final reduction modulus $r$ where the question has 6). Try it online!

#include <stdint.h>
#include <stdio.h>
#include <inttypes.h>

#define A 214013 // for VC; 22695477 for BC #define B 2531011 // for VC; 1 for BC #define R 6 // modulus, in [2, 32768] // static_assert(A > 0 && A % 8 == 5, "A mod 8 must be 5"); // static_assert(B > 0 && B % 2 == 1, "B mod 2 must be 1"); // static_assert(R >= 2 && R <= 32768, "R must be in [2, 32768]");

// decide modulo, reduced when R is a power of two #define M ((uint32_t)(((R-1)&R)!=0 ? 0x8000 : R)<<16)

// Sequence of yn for VC (a=214013, b=2531011), n=6, seed=1639326023 // If R is a power of two, ceil(16/log2(R))+1 entries are enough // Otherwise, that's ceil(31/log2(R)) entries, thus 12 for R=6. const int16_t y[] = { 0,5,3,0,4,3,1,0,4,5,4,4,4,5,5,3,0,2,0,5,4,5,0, // 0,2,5,1,3,5,5,5, };

// modular inverse INVA of A modulo M #define INVA (IN_A(IN_A(IN_A(IN_A((uint32_t)A))))&(M-1)) #define IN_A(v) (v(2-vA))

int main(void) { // decide starting x0 so that it matches y0 const uint32_t y0 = y[0], y1 = y[1]; int32_t x0 = (int32_t)(((M >> 16) - y0) / R * R + y0) << 16 | 0xffff; uint32_t found = 0; printf("generator: ((int)((x = %" PRIu32 " * x + %" PRIu32 ") >> 16) & 0x7fff) %% %u\n", (uint32_t)A, (uint32_t)B, (unsigned)R); while (x0 >= 0) { uint32_t x1 = A * (uint32_t)x0 + B; do { // assert( (x0 >> 16) % R == y0); // assert( x1 == A * (uint32_t)x0 + B); if (((x1 >> 16) & 0x7fff) % R == y1) { uint32_t x = x1; int n; for (n = 2; n < sizeof(y) / sizeof(y); ++n) if ((((x = A x + B) >> 16) & 0x7fff) % R != y[n]) goto nxt; // found a solution x = (INVA * ((uint32_t)x0 - B)) & (M - 1); printf("x0 can be %" PRId32 ", that is seed=%" PRIu32 " modulo 0x%" PRIx32 ", yielding:\n", x0, x, M); // prove out point by showing the output for (n = 0; n < sizeof(y) / sizeof(y) + 8; ++n) printf(" %d", ((int)((x = A x + B) >> 16) & 0x7fff) % R); printf("\n"); ++found; } nxt: x1 -= (uint32_t)A; } while (((x0--) & 0xffff) != 0); x0 -= (int32_t)(R - 1) << 16; } printf("found %" PRIu32 " solution%s\n", found, found > 1 ? "s" : ""); return 0; }

// yielding: // generator: ((int)((x = 214013 * x + 2531011) >> 16) & 0x7fff) % 6 // x0 can be 531633902, that is seed=1639326023 modulo 0x80000000, yielding: // 2 3 4 1 5 1 1 5 4 0 3 2 2 5 5 3 5 5 3 4 0 1 1 4 1 3 3 2 5 4 4 // found 1 solution

fgrieu
  • 149,326
  • 13
  • 324
  • 622