How to Evaluate Regression Model in Python with sklearn
To evaluate a regression model in Python, use
sklearn.metrics functions like mean_squared_error, mean_absolute_error, and r2_score. These metrics measure how close your model's predictions are to the actual values, helping you understand its accuracy.Syntax
Here are the main functions to evaluate regression models in sklearn:
mean_squared_error(y_true, y_pred): Calculates the average squared difference between actual and predicted values.mean_absolute_error(y_true, y_pred): Calculates the average absolute difference between actual and predicted values.r2_score(y_true, y_pred): Measures how well the model explains the variance in the data (1 is perfect, 0 means no explanation).
python
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score # y_true: actual values # y_pred: predicted values mse = mean_squared_error(y_true, y_pred) mae = mean_absolute_error(y_true, y_pred) r2 = r2_score(y_true, y_pred)
Example
This example shows how to train a simple linear regression model and evaluate it using MSE, MAE, and R2 score.
python
from sklearn.datasets import make_regression from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score from sklearn.model_selection import train_test_split # Create sample data X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42) # Split data into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Train linear regression model model = LinearRegression() model.fit(X_train, y_train) # Predict on test data y_pred = model.predict(X_test) # Evaluate predictions mse = mean_squared_error(y_test, y_pred) mae = mean_absolute_error(y_test, y_pred) r2 = r2_score(y_test, y_pred) print(f"Mean Squared Error: {mse:.2f}") print(f"Mean Absolute Error: {mae:.2f}") print(f"R2 Score: {r2:.2f}")
Output
Mean Squared Error: 92.69
Mean Absolute Error: 7.68
R2 Score: 0.89
Common Pitfalls
Common mistakes when evaluating regression models include:
- Using classification metrics like accuracy instead of regression metrics.
- Not splitting data into train and test sets, leading to overly optimistic results.
- Ignoring the scale of errors; for example, MSE squares errors so large errors impact it more.
- Misinterpreting R2 score: a negative R2 means the model is worse than predicting the mean.
python
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score # Wrong: Using accuracy for regression # accuracy = accuracy_score(y_test, y_pred) # This will raise an error or give meaningless result # Right: Use regression metrics mse = mean_squared_error(y_test, y_pred) mae = mean_absolute_error(y_test, y_pred) r2 = r2_score(y_test, y_pred)
Quick Reference
| Metric | Description | Ideal Value |
|---|---|---|
| Mean Squared Error (MSE) | Average of squared differences between actual and predicted values | 0 (lower is better) |
| Mean Absolute Error (MAE) | Average of absolute differences between actual and predicted values | 0 (lower is better) |
| R2 Score | Proportion of variance explained by the model | 1 (higher is better) |
Key Takeaways
Use sklearn.metrics functions like mean_squared_error, mean_absolute_error, and r2_score to evaluate regression models.
Always split your data into training and testing sets to get realistic evaluation results.
Mean Squared Error penalizes larger errors more than Mean Absolute Error.
R2 score shows how well your model explains the data variance; closer to 1 is better.
Avoid using classification metrics like accuracy for regression problems.