Course Content
Classification with Python
Classification with Python
5. Comparing Models
k-NN Summary
From what we have learned, we can tell that k-NN is easy to implement but requires scaling. It has some more peculiarities:
- k-NN does not require training.
Unlike many other algorithms, k-NN does not learn anything during training. It just needs to keep the information about all data points coordinates.
But since all the calculations are performed during predictions, the prediction time is larger compared to other algorithms; - k-NN is a greedy algorithm.
The model calculates distances to each training instance to find the neighbors. Thus, it may get painfully slow for large datasets; - Easy to add new training data.
Since the model does not need to train, we can just add new training data points, and the predictions will adjust; - The curse of dimensionality.
Some algorithms really struggle when the number of dimensions(features) is large. And unfortunately, k-NN has this problem too. The distance between two points in high-dimensional space tends to become similar regardless of the actual values of the features, so it becomes much harder to determine whether the instances are similar.
So, here is a little summary of the k-NN algorithm:
Advantages | Disadvantages |
No training time | Needs feature scaling |
Easy to add new training data | Prediction time is high |
Doesn't work well with a large number of training instances | |
Doesn't work well with a large number of features |
Everything was clear?
Thanks for your feedback!
Section 1. Chapter 8