Why quicksort instead of a B-tree construction?

Question

As far as I know (despite some variations which provide empirical average-case improvements) quick-sort is worst case $O(n^2)$ with the original Hoare partition scheme having a particularly bad behavior for already sorted lists, reverse-sorted, repeated element lists.

On the other hand a B-Tree has $O(\log n)$ insertions, meaning worst-case $O(n\log n)$ to process an array of $n$ elements. Also an easy optimization to memoize the memory-address of the lowest and highest nodes (would make it possible to process sorted / reverse-sorted / repeated-element lists in $O(n)$).

While there are more favored sorting algorithms than quicksort now (e.g. timsort) what originally favored its use? Is it the susceptibility to parallelization (also in place swaps, lower memory complexity)? Otherwise why not just use a B-tree?

Surt · Accepted Answer · 2016-12-04T08:54:57.763

A B-tree has one significant disadvantage on the fastest deep cache machines, it depends on pointers. So as the size grows each access have a greater and greater risk of causing a cache-miss/TLB-miss. Effectively getting a K value of z*x, x=sum(cache/TLB-miss per access, L1-TLB misses are typically size of tree / total cache size), z ~= access time of least cache or main memory that can hold the entire tree.

On the other hand the "average case" quicksort streams memory at maximum pre-fetcher speed. Only drawback here is the average case also cause a stream to be written back. And after some partitions the entire active set sit in caches and will get streamed quicker.

Both algorithms suffers heavily from branch mis-predictions but quicksort, just need to backup a bit, B-Tree additionally needs to read in a new address to fetch from as it has a data dependency which quicksort doesn't.

Few algoritmes are implemented as pure theoretically functions. Nearly all have some heuristics to fix their worst problems, Tim-sort excepted as its build of heuristics.

merge-sort and quick-sort are often checked for already sorted ranges, just like Tim-sort. Both also have an insertion sort for small sets, typically less than 16 elements, Tim-sort is build up of these smaller sets. The C++ std::sort is a quicksort hybrid with insertion sort, with the additional fallback for the worst case behaviour, if the dividing exceed twice the expected depth it changes to a heap-sort algorithm.

The original quicksort used the first element of the array as pivot, this was quickly abandoned for a (pseudo)random element, typically the middle. Some implementations changed to median-of-three (random elements) to get a better pivot, recently a median-of-5-median (of all elements) was used, and last I saw in some presentation from Alexandrescu was a median-of-3-medians (of all elements) to get the a pivot that was close to the actual median (1/3 or a 1/5 of the span).

score 1 · Answer 2 · edited Dec 04 '16 at 02:08

Consider the following general question:

Here is an $O(n\log n)$ sorting algorithm. Why use quicksort (or timsort, or whatever is used in some library) rather than my algorithm?

There are several possible types of answers:

Your algorithm is worse than quicksort in some sense, say it is slower on average.
Your algorithm has similar performance to quicksort. Libraries need only one algorithm, so they chose quicksort for historical reasons.
Your algorithm is better than quicksort in some sense (say worst case complexity), but worse in some other sense (say average case complexity). Libraries prefer the tradeoffs of quicksort.
Your algorithm is better than quicksort, but is not widely known and so not implemented in libraries.
Your algorithm is implemented in some libraries, but they are too obscure for you to realize.

But the real answer is:

Library designers implement some sorting algorithm they know. Later on, other people might implement a different algorithm. Nobody is claiming that whatever algorithm is implemented is the best algorithm in all (or even some) applications. People for whom this is important implement their own algorithms.

Why quicksort instead of a B-tree construction?

2 Answers2