How to use KNN classifier sklearn in python

MlopsHow-ToBeginner · 3 min read

How to Use KNN Classifier in sklearn with Python

To use the KNeighborsClassifier from sklearn in Python, first import it, then create an instance with the number of neighbors, fit it on training data, and finally predict labels for new data. This simple process allows you to classify data points based on their closest neighbors.

📐

Syntax

The basic syntax to use the KNN classifier in sklearn is:

KNeighborsClassifier(n_neighbors=5): creates the classifier with 5 neighbors by default.
fit(X_train, y_train): trains the model on your training features and labels.
predict(X_test): predicts labels for new data points.

python

from sklearn.neighbors import KNeighborsClassifier

# Create classifier with 3 neighbors
knn = KNeighborsClassifier(n_neighbors=3)

# Train the classifier
knn.fit(X_train, y_train)

# Predict labels
predictions = knn.predict(X_test)

💻

Example

This example shows how to use KNN to classify iris flower species using sklearn's built-in dataset.

python

from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create KNN classifier with 3 neighbors
knn = KNeighborsClassifier(n_neighbors=3)

# Train the model
knn.fit(X_train, y_train)

# Predict on test data
y_pred = knn.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Output

Accuracy: 1.00

⚠️

Common Pitfalls

Common mistakes when using KNN include:

Not scaling features: KNN uses distance, so features should be scaled for fair comparison.
Choosing wrong n_neighbors: Too small can cause noise sensitivity, too large can smooth out classes.
Using KNN on very large datasets without optimization can be slow.

Always preprocess data and experiment with n_neighbors to find the best value.

python

from sklearn.preprocessing import StandardScaler

# Wrong way: no scaling
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
print("Accuracy without scaling:", knn.score(X_test, y_test))

# Right way: scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
knn_scaled = KNeighborsClassifier(n_neighbors=3)
knn_scaled.fit(X_train_scaled, y_train)
print("Accuracy with scaling:", knn_scaled.score(X_test_scaled, y_test))

Output

Accuracy without scaling: 1.0 Accuracy with scaling: 1.0

📊

Quick Reference

Here is a quick summary of key parameters and methods for KNeighborsClassifier:

Parameter/Method	Description
n_neighbors	Number of neighbors to use for classification (default 5)
weights	'uniform' or 'distance' to weight neighbors equally or by distance
algorithm	Algorithm used to compute nearest neighbors ('auto', 'ball_tree', 'kd_tree', 'brute')
fit(X, y)	Train the classifier with features X and labels y
predict(X)	Predict labels for new data points X
score(X, y)	Return mean accuracy on given test data and labels

✅

Key Takeaways

Import KNeighborsClassifier from sklearn.neighbors to create a KNN model.

Fit the model on training data using fit() before predicting.

Always scale features before using KNN for better distance calculations.

Choose the number of neighbors (n_neighbors) carefully to balance bias and variance.

Use predict() to classify new data points after training.