0
0
ML Pythonprogramming~5 mins

Bias-variance tradeoff in ML Python

Choose your learning style9 modes available
Introduction
The bias-variance tradeoff helps us understand how to make a model that learns well from data without making too many mistakes or being too simple.
When choosing how complex your model should be for predicting house prices.
When deciding if your model is too simple or too complicated for recognizing handwriting.
When tuning a model to avoid mistakes on new data, like predicting weather.
When you want to improve your model's accuracy without overfitting or underfitting.
Syntax
ML Python
No specific code syntax; it's a concept about balancing two types of errors:
- Bias: Error from too simple a model.
- Variance: Error from too complex a model.
Bias means the model misses important patterns (underfitting).
Variance means the model learns noise as if it were important (overfitting).
Examples
The model is too complex and memorizes training data, causing poor generalization.
ML Python
Low bias, high variance: A very deep decision tree that fits training data perfectly but fails on new data.
The model is too simple and misses important patterns, leading to consistent errors.
ML Python
High bias, low variance: A simple linear model that cannot capture curves in data.
This balance helps the model generalize well to unseen data.
ML Python
Balanced bias and variance: A model with moderate complexity that fits training data well and predicts new data accurately.
Sample Program
This code trains two decision tree models: one very simple and one very complex. It shows how the simple model has high bias (makes errors because it's too simple) and the complex model has high variance (makes errors because it overfits).
ML Python
from sklearn.datasets import make_regression
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Create sample data
X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train simple model (high bias)
simple_model = DecisionTreeRegressor(max_depth=1, random_state=42)
simple_model.fit(X_train, y_train)
y_pred_simple = simple_model.predict(X_test)
simple_mse = mean_squared_error(y_test, y_pred_simple)

# Train complex model (high variance)
complex_model = DecisionTreeRegressor(max_depth=20, random_state=42)
complex_model.fit(X_train, y_train)
y_pred_complex = complex_model.predict(X_test)
complex_mse = mean_squared_error(y_test, y_pred_complex)

print(f"Simple model MSE (high bias): {simple_mse:.2f}")
print(f"Complex model MSE (high variance): {complex_mse:.2f}")
OutputSuccess
Important Notes
Try different model complexities to find the best balance between bias and variance.
Cross-validation helps check if your model generalizes well to new data.
Reducing bias usually increases variance and vice versa; the goal is to find a good middle ground.
Summary
Bias is error from models that are too simple and miss patterns.
Variance is error from models that are too complex and fit noise.
The bias-variance tradeoff is about balancing these errors for best predictions.