I'm given a set of $n$ numbers. Is there a data structure that builds in $O(n)$ (linear time) and gets the $k$'th largest element in $O(k)$ time? Also, is there anything better than $O(k)$?
4 Answers
Here is a solution that uses only comparisons. For simplicity, assume that $n$ is a power of 2.
Find the median of the original array in $O(n)$, and extract the largest $n/2$ elements. Then find the median of the new array in $O(n/2)$, and extract the largest $n/4$ elements. Continue in this way, extracting the $n/8,n/16,\ldots,1$ largest elements. In total, preprocessing takes time $$ O(n+n/2+n/4+\cdots) = O(2n) = O(n). $$
Given $k$, find $\ell$ such that $n/2^{\ell+1} < k \leq n/2^{\ell}$. By construction, $n/2^\ell < 2k$. The $k$th largest element is thus one of the $n/2^{\ell}$ largest elements. Using the linear time selection algorithm, locate the $k$th element among them in $O(n/2^\ell) = O(k)$.
We can improve the running time for $k \leq Cn/\log n$ (for arbitrary $C$) to $O(1)$ as follows.
During preprocessing, use a linear time selection algorithm to locate the $Cn/\log n$-th largest element in $O(n)$, and extract all larger elements. Sort them in $O(n)$.
During query time, locate the $k$th largest element for $k \leq Cn/\log n$ in $O(1)$ using the new array.
Conversely, we can show that this $O(1)$ behavior cannot extend beyond $O(n/\log n)$ for comparison-based algorithms. Indeed, suppose that there is an algorithm which preprocesses an array in $O(n)$, and is able to locate the $k$th element in $O(1)$ for $k \leq f(n)$, where $f(n) = \omega(n/\log n)$. This allows us to sort an array of size $f(n)$ in time $O(n) + O(f(n)) = o(f(n) \log f(n))$ by adding $n - f(n)$ dummy elements, contradicting the well-known lower bound for sorting.
- 280,205
- 27
- 317
- 514
If you can apply Radix sort, just sort the data and you are done.
If you only can compare data, then such algorithm can be used to find arbitrary element of sorted sequence in O(N), while it was proved that such sorting requires O(n*log(N)).
So, no way.
- 2,113
- 1
- 11
- 17
Since these are numbers, we can sort them with Radix Sort in O(N) time and then find k'th largest element in O(1) time.
Extra memory required for radix sort is O(n^epsilon) where epsilon may be any positive number up to 1.
- 2,113
- 1
- 11
- 17
I don't think there is such algorithm. If there was, you would be able to find median in O(n) which is AFAIK unknown or even it is proven it doesn't exist.
- 137
- 6