When should I move beyond k nearest neighbour

Question

For many machine learning projects that we do, we start with the k Nearest Neighbour classifier. This is an ideal starting classifier as we usually have sufficient time to calculate all distances and the number of parameters is limited (k, distance metric and weighting)

However, this has often the effect that we stick with the knn classifier as later in the project there is no room for switching to another classifier. What would be good reason to try a new classifier. Obvious ones are memory and time restraints, but are there cases when another classifier can actually improve the accuracy?

score 3 · Answer 1 · answered Feb 11 '14 at 12:40

kNN is useful for large data samples

However it's disadvantages are:

Biased by value of k.
Computation Complexity
Memory Limitation
Being a supervised learning lazy algorithm
Easily fooled by irrelevant attributes.
Prediction accuracy can quickly degrade when number of attributes increase.

It's usually only effective if the training data is large, and training is very fast.

score 3 · Answer 2 · answered Feb 11 '14 at 13:21

If you would be constrained by computational complexity, decision trees (Quinal, 1986) are hard to beat (especially when a framework offers direct conversion of DT model to a bunch of if statements - like Accord.NET).

For high dimensional data the notion of distance, on which k-NN is based, becomes worthless (Kriegel, Kröger, Zimek, 2009) (also: Wikipedia article). So other classifiers, like SVM (Corter, Vapnik, 1995) or Random Forests (Breiman, 2001), might perform better.

References:

Kriegel, Hans-Peter; Kröger, Peer; Zimek, Arthur (2009), "Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering", ACM Transactions on Knowledge Discovery from Data (New York, NY: ACM) 3 (1): 1–58
Cortes, Corinna; and Vapnik, Vladimir N.; "Support-Vector Networks", Machine Learning, 20, 1995
Leo Breiman. 2001. Random Forests. Mach. Learn. 45, 1 (October 2001), 5-32.
J. R. Quinlan. 1986. Induction of Decision Trees. Mach. Learn. 1, 1 (March 1986), 81-106.

score 3 · Accepted Answer · answered Feb 11 '14 at 22:35

k-NN generalizes in a very restrictive sense. It simply uses smoothness priors (or continuity assumption). This assumption implies that patterns that are close in feature space are most likely belong to the same class. No functional regularity in pattern distribution can be recovered by k-NN.

Thus, it requires representative training samples, which can be extremely large especially in cases of highly dimensional feature spaces. Worse, these samples can be unavailable. Consequently, it cannot learn invariants. If patterns can be subjected to some transformations without changing their labels, and training sample doesn't contain patterns transformed in all admissible ways, k-NN will never recognize transformed patterns that were not presented during training. This is true, e.g., for shifted or rotated images, if they are not represented in some invariant form before running k-NN. k-NN cannot even abstract from irrelevant features.

Another somewhat artificial example is following. Imagine that pattern belonging to different classes distributed periodically (e.g. in accordance with sine - if it is less than 0, then patterns belong to one class, and it is greater, then patterns belong to another class). Training set is finite. So, it will be located in a finite region. Outside this region recognition error will be 50%. One can imagine the logistic regression with periodic basis functions that will perform much better in this case. Other methods will be able to learn other regularities in pattern distributions and extrapolate well.

So, if one suspect that available data set is not representative, and invariance to some transformations of patterns should be achieved, then this is the case, in which one should move beyond k-NN.

When should I move beyond k nearest neighbour

3 Answers3