K Nearest Neighbors in Python: What It Is and How It Works
k nearest neighbors (KNN) algorithm in Python is a simple method to classify or predict data points by looking at the closest k neighbors in the training data. Using sklearn, you can easily apply KNN to find the most common class or average value among neighbors to make predictions.How It Works
K nearest neighbors (KNN) works like finding friends in a crowd who are closest to you. Imagine you want to guess the type of fruit based on its color and size. You look at the k fruits nearest to it and see which type appears most often. That type is your guess.
In KNN, the algorithm measures the distance between points (like how close fruits are) using simple math like Euclidean distance. It then picks the k closest points and uses their labels to decide the label for the new point. This makes KNN easy to understand and use for both classification (categories) and regression (numbers).
Example
This example shows how to use KNN in Python with sklearn to classify iris flowers based on their features.
from sklearn.datasets import load_iris from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Load iris dataset iris = load_iris() X = iris.data y = iris.target # Split data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Create KNN classifier with k=3 knn = KNeighborsClassifier(n_neighbors=3) # Train the model knn.fit(X_train, y_train) # Predict on test data y_pred = knn.predict(X_test) # Calculate accuracy accuracy = accuracy_score(y_test, y_pred) print(f"Accuracy: {accuracy:.2f}")
When to Use
KNN is best when you have a small to medium dataset and want a simple, easy-to-understand model. It works well when the data points that belong to the same group are close together.
Use KNN for tasks like:
- Classifying handwritten digits
- Recommending products based on similar users
- Predicting house prices by looking at nearby houses
However, KNN can be slow with very large datasets and does not work well if data has many irrelevant features.
Key Points
- KNN predicts by looking at the closest
kneighbors in the data. - It uses distance measures like Euclidean distance to find neighbors.
- KNN can be used for classification and regression.
- Choosing the right
kis important for good results. - KNN is simple but can be slow on large datasets.