1

I'm implementing some estimation metrics that take samples of optimisation functions and estimate their properties. One of the metrics requires the data to be sorted; however, since the metric is only an estimation, I was wondering if I could get a comparatively accurate result if the data wasn't sorted precisely, but well enough.

I would expect data could be nearly sorted in less time than actually sorting it. However, I can't find any algorithms that will do this, which I find rather odd, so I thought I'd ask around here.

My idea of "imperfectly sorted" is quite loose. It could, for example, mean:

  • Many elements are perfectly sorted, but some are not
  • Adjacent elements are not necessarily in the correct order, but when comparing groups of elements, the elements increase on average; if plotted, there would be a visible general upwards trend

For the first definition above, I suppose you could take a sub-sample of the data and just quick-sort that, but I don't imagine it'll get you much of a performance improvement without seriously reducing the quality of the estimation.

It might also be possible to take something like quicksort and limit the depth of the recursion.

Do imperfect sorting algorithms exist? Can they significantly out-perform regular sorting algorithms?

1 Answers1

3

The term you're looking for is approximate sorting. This is a topic that, as greybeard mentions in a comment, was more popular back in the 90s. But it's quite helpful for presorting (e.g. getting an array a little bit sorted to ensure that Quicksort doesn't get stuck in its worst case), and this is what's led to most of the research in the area.

You can, in fact, make it somewhat faster than full sorting. Giesen, Schuberth, and Stojaković showed that you can get $n$ elements sorted to a Spearman distance (*) of $\frac{n^2}{\nu(n)}$ in $O(n \min(\log n, \log \nu(n)))$. So the more you're willing to sacrifice accuracy (the smaller you make $\nu(n)$), the faster the algorithm can run.

Giesen, Schuberth, and Stojaković also present a very simple algorithm to do this approximate sorting, either deterministically or with a bit of randomness mixed in. The full details are in the linked paper, but basically they use Quicksort's partitioning a limited number of times and don't care about the order of the elements between the partitions, as long as they end up in the right "bin". If you pick your partitions well, this is in fact equivalent to running Quicksort and limiting the recursion depth.

(*) The Spearman footrule distance between two sequences is the sum of the distance between each element's intended position and its actual position. So if you swap two elements, that leads to a Spearman distance of 2: those two elements are each one position away from their intended place.

Draconis
  • 7,216
  • 1
  • 19
  • 28