6

There is an array with $n$ places. There is a stream of $n$ unique numbers that arrive at a random order (permutation selected uniformly at random).

Whenever a number arrives, we must put it somewhere in the array, and we are not allowed to move it later. The goal is to have as many numbers as possible (in expectation) in their correct location in the final array. The "correct location" is defined as the location where it would appear if the numbers were sorted.

What is known about this problem? What is the best expected success rate, and what algorithm attains it?

Notes:

  • A related question is: What is the fastest online sorting algorithm? . It discusses a situation in which items can be moved when new items arrive, and the goal is to minimize the running time.

  • I am mainly interested in maximizing the expected number of the correct positions. However, it is also interesting if there is a way to improve the probability that all numbers are in the correct position, above O(1/n!)

Erel Segal-Halevi
  • 6,088
  • 1
  • 25
  • 60

2 Answers2

5

Define a configuration to be a state of the array $A[]$ after receiving some of the numbers. Define a configuration to be obviously wrong if there exists indices $i<j$ such that $A[i],A[j]$ have already been filled in with numbers such that $A[i]>A[j]$.

A randomized algorithm

Here is a natural algorithm: guess where to put each number uniformly at random, conditioned on the one constraint that you never place a number in an obviously wrong position.

The analysis of this approach seems pretty tricky. Based on some back-of-the-envelope estimates I conjecture that this attains a $\sim e^{-\Theta(n)}$ probability of placing all numbers in the correct position, which is better than the naive $1/n!$ algorithm. However I have no proof, so this could be wrong.

A deterministic algorithm

Here's what I can show. Let's consider a variation on the above algorithm: when you receive each number, find the range of array indices that wouldn't be obviously wrong, and then place that number in the exact middle (median) of that range. This algorithm is now deterministic.

For this algorithm, I can show that the probability of placing all numbers in the correct position is asymptotically larger than $1/n!$. In particular, it is at least something like $1/2^{\Theta(n)}$. Here's a hand-wavy analysis. For simplicity, let's assume $n$ is one less than a power of two. Place all $n$ numbers in a complete binary search tree. Now consider all sequences where you enumerate all the numbers at depth $1$ in some order, then all the numbers at depth $2$ in some order, then all the numbers at depth $3$ in some order, and so on. A hand-wavy estimate suggests that there are something in the vicinity of $n!/2^{\Theta(n)}$ such sequences, i.e., a random permutation has very roughly a $1/2^{\Theta(n)}$ probability of having this form. Moreover, the second algorithm above always succeeds in placing every number in the correct position. Therefore, the second algorithm above achieves a success rate of something in the vicinity of $1/2^{\Theta(n)}$. In particular, the success rate is significantly larger than $1/n!$.

For values of $n = 2^i-1$, we can characterize the exact probability of being correct: letting $P(n)$ denote the probability that the second algorithm succeeds in placing every number in the correct position, we find

$$P(n) = {1 \over n} \times P(\lfloor n/2 \rfloor)^2,$$

because the first element needs to be the median, and then you have two problems of half the size (the subsequence of numbers smaller than the median has to be a sequence with the same property, and the same for the subsequence of numbers larger than the median). We have the base cases $P(1)=1$ and $P(3)=1/3$. This recurrence relation grows like $P(n) \sim 2^{-e/2 \cdot (n+1)}$. Here $e/2 \approx 1.359\ldots$, so $P(n)$ grows roughly like $2^{-1.36 n}$, and in particular, much faster than $1/n!$.

In particular, here's one way to analyze this recurrence relation. Let $Q(n) = -\lg P(n)$. Then we find

$$Q(n) = 2 Q(\lfloor n/2 \rfloor) + \lg n,$$

where $Q(1)=0$ and $Q(3) = \lg 3$. Letting $R(i) = Q(2^i-1)$, we find

$$R(i) = 2 R(i-1) + \lg(2^i - 1),$$

with base case $R(1)=0$. Expanding, we find

$$R(i) = 2^{i-1} \times \left(\lg(1) + {\lg 3 \over 2} + {\lg 7 \over 4} + {\lg 15 \over 8} + {\lg 31 \over 16} + \dots + {\lg (2^i-1) \over 2^{i-1}}\right).$$

Summing the series, we find that to an excellent approximation

$$R(i) \approx 2^{i-1} \times e.$$

Therefore, $Q(n) \approx e/2 \cdot (n+1)$ and $P(n) \approx 2^{-e/2 \cdot (n+1)}$.

Credits: My thanks to @Algorithms with Attitude for the recurrence relation and the idea of analyzing the probability in this way.

D.W.
  • 167,959
  • 22
  • 232
  • 500
1

maybe the title of the question should be a little different, like "Online sorting with the lowest number of modifications of the position of a number".

If you do not have the distribution of the numbers, than you should try to estimate and update the number distribution according to the numbers that come. There are algorithms to create a distribution function according to some data set (the numbers received until now) https://www.mathworks.com/matlabcentral/answers/33917-how-do-i-determine-the-probability-distribution-of-data

Run the probability calculation algorithm after every number and then insert the new number accordingly in your array.

I have no idea where you should put the first number, probably in the middle.

yoyo_fun
  • 818
  • 1
  • 8
  • 18