Suppose we have a probability space $(\Omega,{\mathscr F},P)$ consisting of
An arbitrary nonempty set $\Omega$
A collection ${\mathscr F}$ of subsets of $\Omega$ which is also a $\sigma$-algebra on subsets of $\Omega$
A probability measure $P: {\mathscr F} \rightarrow [0,1]$
I'm reading a text which shows how to generate a sequence of numbers $X_1,X_2,\ldots,X_n$ such that the distribution of the sequence is in $U(0,1)$ and "random" in some sense. That's two properties:
Lies in distribution. This can be established by a "convergence in distribution" test $\lim_{n\rightarrow\infty}F_{X_n}(x) = F_X(x)$ where $F_{X_n}$ is the empirical CDF of the sequence and $F_X$ is the theoretical CDF of the probability space.
Sequence is "random". There is a large literature on the notion of randomness and a very small and ad hoc literature on practical testing of sequences for randomness. Maybe what I am looking for is Martin-Löf randomness.
One version of the Martin-Löf randomness definition is that "A sequence is Martin-Löf random if and only if no constructive martingale succeeds on it."
Q1. What is considered the "best practice" to formally state property 2? See for example this paper.
Q2. The Martin-Löf definition in Wikipedia is independent of the probability space. Is this correct? In general, is the randomness of a sequence independent of the probability distribution it converges to?
One possible answer, adapted from Wikipedia. It's not as constructive as I'd like, and it doesn't reference a particular probability distribution, which seems also undesirable:
- Let $S = (X_1,\ldots, X_n, Y_1, \ldots, Y_m)$ be a sequence comprised of sequence $X_i$ followed by sequence $Y_j$. Let us define a martingale $d:\Omega^\ast \rightarrow [0,\infty)$ such that for all sequences $S$, $d(S) = \frac{1}{2} (d(X) + d(Y))$. A martingale is said to succeed on $S$ if $\lim_{n\rightarrow\infty} d(S_1,\ldots,S_n) = \infty$. A martingale is said to be constructive if there exists a computable function ${\hat d}: \Omega^\ast \times {\mathbb N} \rightarrow {\mathbb Q}$ such that, for all $S$ for all $t>0$, ${\hat d}(S,t) \leq {\hat d}(S,t+1) < d(S)$. A sequence is Martin-Löf random if and only if no constructive martingale succeeds on it.
That is, the above is not constructive in the sense that it doesn't supply an explicit test for randomness of a particular realized sequence with respect to a particular probability distribution. The Wikipedia page on tests of randomness lists a number of fairly ad hoc criteria, nothing that looks mathematically rigorous.
Also I just found an equivalent question here.
Another related criterion is whether or not the sequence is incompressible. This test however would reject a single sequence which happens to be compressible. The implication is that tests of randomness would need to work on multiple samples of a random sequence generator to arrive at a conclusion, not a single output. So in that view, if the outputs are highly incompressible on average, then the random sequence generator is good (one can imagine a distribution of compressibility where a certain shape of the realized distribution corresponds to "random"). This program uses compressibility tests to assess the randomness of a sequence.
In the end what I'm concerned about in Q2 is tests of randomization. The most satisfying test that occurred to me is to generate many sequences from the RNG, then plot the empirical PDF of the compression ratio of the sequences, i.e. take a good compression algorithm and divide original sequence size over compressed size. If that clusters around 1, then you have good randomization.
Q1, tests of distribution fit, has many accepted tests. Just Q2 is less talked about.
