Find the median of a list of sorted arrays

Question

Input: A set of $\ell$ arrays $A_i$ (of numbers).
The elements within each array are in sorted order, but the set of arrays is not necessarily sorted. The arrays are not necessarily the same size. The total number of elements is $n$.

Output: The $k$th smallest element out of all elements in the input.

What's the most efficient algorithm for this problem?

Is it possible, for example to achieve a running time of $O(\ell + \log n)$?

Matt Lewis · Answer 1 · 2013-02-05T12:24:02.317

You can do it in $O(l + k \text{ log } l)$ time and $O(l)$ extra space as follows:

Build a binary heap with one entry for each of the arrays. The key for entry $i$ is the smallest element in array $A_i$. This takes $O(l)$ time.
Select the smallest entry from the heap and remove it (taking $O(\text{log } l$) time). Add that entry back to the heap using the next smallest entry in the relevant array as its key (again $O(\text{log } l)$ time).
Do the previous step $k$ times. The last element you remove from the heap is your answer.

If you replace the binary heap with a Fibonacci heap, I think this gets you down to amortized $O(l + k)$ time, but in practice it'll be slower than the binary heap unless $l$ is HUGE.

I suspect that the Fibonacci heap bound is optimal, because intuitively you're going to have to inspect at least $k$ elements to find the $k$th smallest one, and you're going to have to inspect at least one element from each of the $l$ arrays since you don't know how they're sorted, which immediately gives a lower bound of $\Omega(\text{max}(k, l)) = \Omega(k + l)$.

score 5 · Answer 2 · answered Feb 06 '13 at 02:38

Here is a randomized $O(\ell\log^2 n)$ algorithm. It can probably be derandomized using the same trick used to derandomize the usual quickselect.

We emulate the classical quickselect algorithm. In each phase, you pick a pivot and calculate how many elements are below it, in $O(\ell\log n)$, using binary search in each list. Then you remove elements on the wrong side, and repeat. The process ends after $\log n$ iterations in expectation.

score 1 · Answer 3 · answered Feb 13 '13 at 00:06

This seems to be resolved by the paper Generalized selection and ranking (Preliminary Version) by Frederickson and Johnson in STOC '80.

They give upper and lower bounds of: $\Theta(\ell + \sum_{i=1}^\ell \log|A_i|)$ which turns out to be $\ell \log n$ for most array size distributions.

The actual algorithm to achieve the upper bound is apparently given in a previous paper: Optimal algorithms for generating quantile information in X+Y and matrices with sorted columns, Proc. 13th Annual Conference on Information Science and Systems, The Johns Hopkins University (1979) 47-52.

vonbrand · Answer 4 · 2013-02-05T00:25:12.380

0

An $\ell$-way merge takes time $\Theta(n \log \ell)$ (use an efficient way to represent a priority queue of the head elements in each list), then you pick the $k$-th element in constant time. I think this is discussed in Knuth's "Sorting and searching" for sorting. Getting the smallest (or largest) clearly takes $\Theta(\ell)$, for an unsorted array it is $O(n)$ IIRC.

Please describe your algorithm.

edited Feb 05 '13 at 00:25

answered Feb 05 '13 at 00:17

vonbrand

14,204
3
42
52

Find the median of a list of sorted arrays

4 Answers4

Linked