I have been reading this book for my class, Randomized Algorithms. In this particular book, there is a whole section dedicated to finding the median of an array using random selection, that leads to a more efficient algorithm. Now, I wanted to know if there are any practical applications of this algorithm, in the domain of computer science, besides a theoretical improvement. Are there any algorithms or data structures that need to find the median of an array?
4 Answers
if there are any practical applications of this algorithm in the domain of computer science besides being a theoretical improvement
The application of this algorithm is trivial - you use it whenever you want to compute a median of a set of data (array in other words). This data may come from different domains: astronomical observations, social science, biological data, etc.
However, it is worth mentioning when to prefer median to mean (or mode). Basically, in descriptive statistics, when our data is perfectly normal distributed then mean, mode, and median are equal, i.e. they coincide. On the other hand, when our data is skewed, i.e. the frequency distribution for our data is (left/right) skewed, the mean fails to provide the best central location because the skewness is dragging it away from the typical value to left or right, while the median is not as strongly influenced by the skewed data, and thus best retains this position pointing to a typical value. Thus computing a median might be preferable when you deal with skewed data.
Also, machine learning is where statistical methods are heavily used, for example $k$-medians clustering.
- 9,905
- 2
- 26
- 36
Median filtering is common in reduction of certain types of noise in image processing. Especially salt and pepper noise. It works by picking out the median value in each color channel in each local neighbourhood of the image and replacing it with it. How large these neighbourhoods are can vary. Popular filter sizes (neighbourhoods) are for example 3x3 and 5x5 pixels.
- 231
- 2
- 7
Computing medians is particularly important in randomized algorithms.
Quite often, we have an approximation algorithm that, with probability at least $\tfrac34$, gives an answer within a factor of $1\pm\epsilon$ of the true answer $A$. Of course, in reality, we want to get an almost-correct answer with much higher probability than $\tfrac34$. So we repeat the algorithm $k$ times and then take the median. The median will be within $A(1\pm\epsilon)$ unless at least half of the $k$ samples were less than $A(1-\epsilon)$ or at least half were bigger than $A(1+\epsilon)$, and this has probability exponentially small in $k$.
Computing medians takes our crappy "It's wrong one time in four" algorithm and turns it into an "It's wrong once in $2^n$ runs" algorithm while only adding a factor of something like $n$ to the running time.
- 82,470
- 26
- 145
- 239
The median of medians has some applications:
- Finding a pivot for quicksort, which brings its worst-time complexity to $ O(n \log n)$.
- Finding a pivot for quickselect, bringing it's worst-time complexity to $O(n)$, from $O(n^2)$.