How to Use KNN Classifier in sklearn with Python
To use the
KNeighborsClassifier from sklearn in Python, first import it, then create an instance with the number of neighbors, fit it on training data, and finally predict labels for new data. This simple process allows you to classify data points based on their closest neighbors.Syntax
The basic syntax to use the KNN classifier in sklearn is:
KNeighborsClassifier(n_neighbors=5): creates the classifier with 5 neighbors by default.fit(X_train, y_train): trains the model on your training features and labels.predict(X_test): predicts labels for new data points.
python
from sklearn.neighbors import KNeighborsClassifier # Create classifier with 3 neighbors knn = KNeighborsClassifier(n_neighbors=3) # Train the classifier knn.fit(X_train, y_train) # Predict labels predictions = knn.predict(X_test)
Example
This example shows how to use KNN to classify iris flower species using sklearn's built-in dataset.
python
from sklearn.datasets import load_iris from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Load iris dataset iris = load_iris() X = iris.data y = iris.target # Split data into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Create KNN classifier with 3 neighbors knn = KNeighborsClassifier(n_neighbors=3) # Train the model knn.fit(X_train, y_train) # Predict on test data y_pred = knn.predict(X_test) # Calculate accuracy accuracy = accuracy_score(y_test, y_pred) print(f"Accuracy: {accuracy:.2f}")
Output
Accuracy: 1.00
Common Pitfalls
Common mistakes when using KNN include:
- Not scaling features: KNN uses distance, so features should be scaled for fair comparison.
- Choosing wrong
n_neighbors: Too small can cause noise sensitivity, too large can smooth out classes. - Using KNN on very large datasets without optimization can be slow.
Always preprocess data and experiment with n_neighbors to find the best value.
python
from sklearn.preprocessing import StandardScaler # Wrong way: no scaling knn = KNeighborsClassifier(n_neighbors=3) knn.fit(X_train, y_train) print("Accuracy without scaling:", knn.score(X_test, y_test)) # Right way: scale features scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) knn_scaled = KNeighborsClassifier(n_neighbors=3) knn_scaled.fit(X_train_scaled, y_train) print("Accuracy with scaling:", knn_scaled.score(X_test_scaled, y_test))
Output
Accuracy without scaling: 1.0
Accuracy with scaling: 1.0
Quick Reference
Here is a quick summary of key parameters and methods for KNeighborsClassifier:
| Parameter/Method | Description |
|---|---|
| n_neighbors | Number of neighbors to use for classification (default 5) |
| weights | 'uniform' or 'distance' to weight neighbors equally or by distance |
| algorithm | Algorithm used to compute nearest neighbors ('auto', 'ball_tree', 'kd_tree', 'brute') |
| fit(X, y) | Train the classifier with features X and labels y |
| predict(X) | Predict labels for new data points X |
| score(X, y) | Return mean accuracy on given test data and labels |
Key Takeaways
Import KNeighborsClassifier from sklearn.neighbors to create a KNN model.
Fit the model on training data using fit() before predicting.
Always scale features before using KNN for better distance calculations.
Choose the number of neighbors (n_neighbors) carefully to balance bias and variance.
Use predict() to classify new data points after training.