11

Suppose that we read a sequence of $n$ numbers, one by one. How to find $k$'th smallest element just with using $O(k)$ cell memory and in linear time ($O(n)$). I think we should save first $k$ terms of sequence and when get the $k+1$'th term, delete a term which we sure that it cannot be the $k$'th smallest element and then save $k+1$'th term. So we should have an indicator that shows this unusable term in each step and this indicator should be update in each step quickly. I began with "max"; but it cannot update quickly; Means that if we consider max then in first deletion we miss the max and we should search for max in $O(k)$ and its cause $(n-k)\times O(k)$ time that it's not linear. Maybe we should save first $k$ terms of sequence more intelligently.

How do I solve this problem?

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
Shahab_HK
  • 147
  • 1
  • 8

2 Answers2

16

Create a buffer of size $2k$. Read in $2k$ elements from the array. Use a linear-time selection algorithm to partition the buffer so that the $k$ smallest elements are first; this takes $O(k)$ time. Now read in another $k$ items from your array into the buffer, replacing the $k$ largest items in the buffer, partition the buffer as before, and repeat.

This takes $O(k * n/k) = O(n)$ time and $O(k)$ space.

jbapple
  • 3,390
  • 18
  • 21
3

You can do it in $O(k)$ memory and $O(n \log k)$ time by forming a fixed size max-heap from the first $k$ elements in $O(k)$ time, then iterating over the rest of the array and pushing a new element and then popping for $O(\log k)$ for every element giving total time $O(k + n\log k)$ = $O(n \log k)$.

You can do it in $O(\log n)$ auxiliary memory and $O(n)$ time by using the median-of-medians selection algorithm, selecting at $k$, and returning the first $k$ elements. With no change to asymptotics you can use introselect to speed up the average case. This is the canonical way to solve your problem.

Now technically $O(\log n)$ and $O(k)$ are incomparable. However I argue that $O(\log n)$ is better in practice, as it's effectively constant considering no computer system has more than $2^{64}$ bytes of memory, $\log 2^{64}= 64$. Meanwhile $k$ can grow to be as large as $n$.

orlp
  • 13,988
  • 1
  • 26
  • 41