Reservoir Sampling vs Round Robin

Question

You are given a List of numbers (length unknown).

Let's say the length is 10.

GetRandom(List) is called once. If implemented correctly, each number has 1/10 probability of being returned.

GetRandom(List) is called 100 times. If implemented correctly, each number will appear 10 times in the result.

Fine?

You now have to do the same for a Stream of numbers.

GetRandom(Stream, 5) is called. This adds 5 to the Stream. Stream is of length N=1, then 5 is returned (probability = 1/N = 1)

GetRandom(Stream, 3) is called. 3 is added to stream. N=2. Either 3 or 5 is returned (prob = 1/2).

How will this be tested for correctness?
If GetRandom(Stream) (without adding any more numbers) is called 10 times when length of list is 2, each number (3 & 5) should be returned ~5 times.

GetRandom(Stream, 7) is called. 7 is added to Stream. N = 3. One of the 3 numbers (5, 3, 7) is returned (probability = 1/3).

But how will this be tested for correctness?
If GetRandom(Stream) is called 10 times when N = 3, each number is returned ~3 times.

So far, so good ?

Alright, here is my algorithm:

N = 0
Pointer = 0

GetRandom(Stream, Number = NULL):
    Pointer += 1

    if Number is NOT NULL:
        N += 1
    else:
        if Pointer == N:
            Pointer = 1    # Reset

    return Stream[Pointer]    # Assume 1-based indexes

This simply cycles through all numbers in order / round-robin fashion.

If GetRandom(Stream) is called on a Stream with 100 numbers, 1000 times, each number will appear exactly 10 times.

If GetRandom(Stream, 77) is called on a Stream with 100 numbers (77 is the 101st number), while Pointer got reset to initial location 1. Then when GetRandom(Stream) is called 101 times, then on the 101st call, 77 will be output, which satisfies the required probability of 1/101. If it's called 202 times, then on the 202th call, 77 will be output, which satisfies 2/202.

Why bother with Reservoir Sampling k/k+1, Why bother with a random number generator?

user3494047 · Answer 1 · 2020-02-18T15:30:06.830

it seems that you're asking why bother with reservoir sampling when you're capable of tricking the test that you wrote?

Round robin doesn't return random numbers. It returns numbers deterministically. Well much more deterministic seeming than reservoir/other methods.

Your tests should be better. If you need the result to seem random and not deterministic, make a test which captures that instead of one which is based on empirical probability.

EDIT to add a test example: another test for randomness (one that round robin would fail) is that you run the same process many times and don't get the same result every time. For uniform random sampling of a stream (the output of reservoir sampling) for a set/list/stream of a fixed size the probability to get one specific subset of size k should be 1/(n choose k) . You can run your method once then another 10000 (or any number of times) and see that you get the first result approximately only 1/(n choose k) times.

Reservoir Sampling vs Round Robin

1 Answers1