0
0
ML Pythonprogramming~5 mins

Residual analysis in ML Python

Choose your learning style9 modes available
Introduction

Residual analysis helps us check how well a model fits data by looking at the differences between actual and predicted values.

After training a regression model to see if predictions are accurate.
To find patterns in errors that suggest the model is missing something.
When deciding if a model is good enough for making decisions.
To check if assumptions about data (like constant error size) hold true.
When comparing different models to pick the best one.
Syntax
ML Python
residuals = actual_values - predicted_values

Residuals are simply the difference between what really happened and what the model guessed.

They help us find if the model is making consistent mistakes.

Examples
This example calculates residuals for three points and prints them.
ML Python
actual = [3, 5, 7]
predicted = [2.5, 5.5, 6.8]
residuals = [a - p for a, p in zip(actual, predicted)]
print(residuals)
Using numpy arrays to find residuals for faster calculations on bigger data.
ML Python
import numpy as np
actual = np.array([10, 15, 20])
predicted = np.array([9, 14, 22])
residuals = actual - predicted
print(residuals)
Sample Program

This program trains a simple linear regression model, predicts values, calculates residuals, and shows the mean squared error to measure overall error size.

ML Python
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([3, 4, 2, 5, 6])

# Train a simple linear regression model
model = LinearRegression()
model.fit(X, y)

# Predict values
predictions = model.predict(X)

# Calculate residuals
residuals = y - predictions

# Calculate mean squared error
mse = mean_squared_error(y, predictions)

print(f"Predictions: {predictions}")
print(f"Residuals: {residuals}")
print(f"Mean Squared Error: {mse:.3f}")
OutputSuccess
Important Notes

Residuals close to zero mean the model predicts well for those points.

Look for patterns in residuals; random scatter means good fit, patterns mean problems.

Residual analysis is mostly used for regression, not classification.

Summary

Residuals show the difference between actual and predicted values.

They help check if a model fits data well or misses patterns.

Mean squared error summarizes the average size of residuals.