R2 Score in Machine Learning with Python: Definition and Example
r2_score in machine learning measures how well a model predicts the target values compared to a simple average. It ranges from negative infinity to 1, where 1 means perfect prediction and values less than 0 mean the model is worse than guessing the average. In Python, you can calculate it using sklearn.metrics.r2_score.How It Works
The R2 score, also called the coefficient of determination, tells us how much of the variation in the target data our model can explain. Imagine you want to predict house prices. If your model guesses prices perfectly, the R2 score is 1. If it just guesses the average price for every house, the R2 score is 0.
Think of it like this: if you throw darts blindfolded, your guesses are random and the R2 score will be low or negative. But if you can aim well and hit close to the target, the R2 score will be closer to 1. It compares the errors of your model to the errors you'd get by always guessing the average.
Example
This example shows how to calculate the R2 score in Python using sklearn. We create some sample data, fit a simple linear model, and then measure how well it predicts.
from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score import numpy as np # Sample data: features and target values X = np.array([[1], [2], [3], [4], [5]]) y_true = np.array([3, 5, 7, 9, 11]) # Target values # Create and train the model model = LinearRegression() model.fit(X, y_true) # Predict using the model y_pred = model.predict(X) # Calculate R2 score score = r2_score(y_true, y_pred) print(f"R2 score: {score:.2f}")
When to Use
Use the R2 score when you want to measure how well a regression model fits your data. It helps you understand if your model is good at predicting continuous values like prices, temperatures, or sales.
For example, if you build a model to predict house prices, the R2 score tells you how much better your model is compared to just guessing the average price. A higher R2 means your model captures more of the real patterns in the data.
However, R2 is mainly for regression problems and should be used alongside other metrics to get a full picture of model performance.
Key Points
- The R2 score ranges from negative infinity to 1, where 1 is perfect prediction.
- It compares your model's errors to the errors of a simple average prediction.
- It is used only for regression tasks, not classification.
- A higher R2 score means a better fit to the data.