0
0
MlopsConceptBeginner · 3 min read

R2 Score in Machine Learning with Python: Definition and Example

The r2_score in machine learning measures how well a model predicts the target values compared to a simple average. It ranges from negative infinity to 1, where 1 means perfect prediction and values less than 0 mean the model is worse than guessing the average. In Python, you can calculate it using sklearn.metrics.r2_score.
⚙️

How It Works

The R2 score, also called the coefficient of determination, tells us how much of the variation in the target data our model can explain. Imagine you want to predict house prices. If your model guesses prices perfectly, the R2 score is 1. If it just guesses the average price for every house, the R2 score is 0.

Think of it like this: if you throw darts blindfolded, your guesses are random and the R2 score will be low or negative. But if you can aim well and hit close to the target, the R2 score will be closer to 1. It compares the errors of your model to the errors you'd get by always guessing the average.

💻

Example

This example shows how to calculate the R2 score in Python using sklearn. We create some sample data, fit a simple linear model, and then measure how well it predicts.

python
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import numpy as np

# Sample data: features and target values
X = np.array([[1], [2], [3], [4], [5]])
y_true = np.array([3, 5, 7, 9, 11])  # Target values

# Create and train the model
model = LinearRegression()
model.fit(X, y_true)

# Predict using the model
y_pred = model.predict(X)

# Calculate R2 score
score = r2_score(y_true, y_pred)
print(f"R2 score: {score:.2f}")
Output
R2 score: 1.00
🎯

When to Use

Use the R2 score when you want to measure how well a regression model fits your data. It helps you understand if your model is good at predicting continuous values like prices, temperatures, or sales.

For example, if you build a model to predict house prices, the R2 score tells you how much better your model is compared to just guessing the average price. A higher R2 means your model captures more of the real patterns in the data.

However, R2 is mainly for regression problems and should be used alongside other metrics to get a full picture of model performance.

Key Points

  • The R2 score ranges from negative infinity to 1, where 1 is perfect prediction.
  • It compares your model's errors to the errors of a simple average prediction.
  • It is used only for regression tasks, not classification.
  • A higher R2 score means a better fit to the data.

Key Takeaways

R2 score measures how well a regression model predicts compared to the average.
An R2 score of 1 means perfect prediction; 0 means no improvement over average guessing.
Use sklearn.metrics.r2_score in Python to calculate it easily.
R2 score is useful for evaluating regression models like predicting prices or temperatures.
Always consider other metrics alongside R2 for a complete model evaluation.