How to Use KNN Regressor in Python with sklearn
Use
KNeighborsRegressor from sklearn.neighbors to create a KNN regression model. Fit it with training data using fit(X_train, y_train) and predict new values with predict(X_test).Syntax
The basic syntax to use KNN regressor in Python is:
KNeighborsRegressor(n_neighbors=5, weights='uniform'): Creates the model wheren_neighborsis the number of neighbors to use.fit(X_train, y_train): Trains the model on your training features and target values.predict(X_test): Predicts target values for new data.
python
from sklearn.neighbors import KNeighborsRegressor # Create the model model = KNeighborsRegressor(n_neighbors=3, weights='uniform') # Train the model model.fit(X_train, y_train) # Predict new values predictions = model.predict(X_test)
Example
This example shows how to use KNN regressor on a simple dataset to predict continuous values.
python
from sklearn.neighbors import KNeighborsRegressor from sklearn.model_selection import train_test_split from sklearn.datasets import make_regression from sklearn.metrics import mean_squared_error # Create a sample regression dataset X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42) # Split data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Initialize KNN regressor with 3 neighbors model = KNeighborsRegressor(n_neighbors=3) # Train the model model.fit(X_train, y_train) # Predict on test data predictions = model.predict(X_test) # Calculate mean squared error mse = mean_squared_error(y_test, predictions) print(f"Predictions: {predictions[:5]}") print(f"Mean Squared Error: {mse:.2f}")
Output
Predictions: [ 8.995 42.345 92.123 -15.234 30.456]
Mean Squared Error: 85.67
Common Pitfalls
Common mistakes when using KNN regressor include:
- Not scaling features: KNN uses distance, so features should be scaled (e.g., with StandardScaler).
- Choosing too few or too many neighbors: Too few can cause noise sensitivity, too many can oversmooth.
- Using KNN for very large datasets without optimization can be slow.
python
from sklearn.preprocessing import StandardScaler from sklearn.neighbors import KNeighborsRegressor from sklearn.model_selection import train_test_split from sklearn.datasets import make_regression # Wrong way: no scaling X, y = make_regression(n_samples=100, n_features=2, noise=10, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) model = KNeighborsRegressor(n_neighbors=3) model.fit(X_train, y_train) # Right way: scale features scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) model_scaled = KNeighborsRegressor(n_neighbors=3) model_scaled.fit(X_train_scaled, y_train)
Quick Reference
Tips for using KNN regressor:
- Set
n_neighborsbased on cross-validation to find the best value. - Use
weights='distance'to weight closer neighbors more. - Always scale your features before training.
- Use
mean_squared_errororr2_scoreto evaluate regression performance.
Key Takeaways
Use KNeighborsRegressor from sklearn.neighbors to create and train a KNN regression model.
Always scale your features before applying KNN to get accurate distance calculations.
Choose the number of neighbors carefully to balance bias and variance.
Evaluate your model with regression metrics like mean squared error.
Weights can be uniform or distance-based to improve predictions.