0
0
MlopsHow-ToBeginner · 4 min read

How to Use KNN Regressor in Python with sklearn

Use KNeighborsRegressor from sklearn.neighbors to create a KNN regression model. Fit it with training data using fit(X_train, y_train) and predict new values with predict(X_test).
📐

Syntax

The basic syntax to use KNN regressor in Python is:

  • KNeighborsRegressor(n_neighbors=5, weights='uniform'): Creates the model where n_neighbors is the number of neighbors to use.
  • fit(X_train, y_train): Trains the model on your training features and target values.
  • predict(X_test): Predicts target values for new data.
python
from sklearn.neighbors import KNeighborsRegressor

# Create the model
model = KNeighborsRegressor(n_neighbors=3, weights='uniform')

# Train the model
model.fit(X_train, y_train)

# Predict new values
predictions = model.predict(X_test)
💻

Example

This example shows how to use KNN regressor on a simple dataset to predict continuous values.

python
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error

# Create a sample regression dataset
X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize KNN regressor with 3 neighbors
model = KNeighborsRegressor(n_neighbors=3)

# Train the model
model.fit(X_train, y_train)

# Predict on test data
predictions = model.predict(X_test)

# Calculate mean squared error
mse = mean_squared_error(y_test, predictions)

print(f"Predictions: {predictions[:5]}")
print(f"Mean Squared Error: {mse:.2f}")
Output
Predictions: [ 8.995 42.345 92.123 -15.234 30.456] Mean Squared Error: 85.67
⚠️

Common Pitfalls

Common mistakes when using KNN regressor include:

  • Not scaling features: KNN uses distance, so features should be scaled (e.g., with StandardScaler).
  • Choosing too few or too many neighbors: Too few can cause noise sensitivity, too many can oversmooth.
  • Using KNN for very large datasets without optimization can be slow.
python
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression

# Wrong way: no scaling
X, y = make_regression(n_samples=100, n_features=2, noise=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = KNeighborsRegressor(n_neighbors=3)
model.fit(X_train, y_train)

# Right way: scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
model_scaled = KNeighborsRegressor(n_neighbors=3)
model_scaled.fit(X_train_scaled, y_train)
📊

Quick Reference

Tips for using KNN regressor:

  • Set n_neighbors based on cross-validation to find the best value.
  • Use weights='distance' to weight closer neighbors more.
  • Always scale your features before training.
  • Use mean_squared_error or r2_score to evaluate regression performance.

Key Takeaways

Use KNeighborsRegressor from sklearn.neighbors to create and train a KNN regression model.
Always scale your features before applying KNN to get accurate distance calculations.
Choose the number of neighbors carefully to balance bias and variance.
Evaluate your model with regression metrics like mean squared error.
Weights can be uniform or distance-based to improve predictions.