Overfitting and underfitting explain why a model might not work well on new data. They help us understand if the model learned too much or too little from the training data.
Overfitting and underfitting in ML Python
No specific code syntax; these are concepts to check during model training and evaluation.Overfitting means the model memorizes training data details and noise.
Underfitting means the model is too simple to capture the data patterns.
# Overfitting example from sklearn.tree import DecisionTreeClassifier model = DecisionTreeClassifier(max_depth=None) # very deep tree model.fit(X_train, y_train) train_score = model.score(X_train, y_train) test_score = model.score(X_test, y_test)
# Underfitting example from sklearn.tree import DecisionTreeClassifier model = DecisionTreeClassifier(max_depth=1) # very shallow tree model.fit(X_train, y_train) train_score = model.score(X_train, y_train) test_score = model.score(X_test, y_test)
This code trains two decision tree models on the iris dataset. One is very deep (likely overfitting), the other is very shallow (likely underfitting). It prints their accuracy on training and test data to show the difference.
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier # Load data iris = load_iris() X, y = iris.data, iris.target # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Overfitting model: very deep tree model_overfit = DecisionTreeClassifier(max_depth=None, random_state=42) model_overfit.fit(X_train, y_train) train_score_overfit = model_overfit.score(X_train, y_train) test_score_overfit = model_overfit.score(X_test, y_test) # Underfitting model: very shallow tree model_underfit = DecisionTreeClassifier(max_depth=1, random_state=42) model_underfit.fit(X_train, y_train) train_score_underfit = model_underfit.score(X_train, y_train) test_score_underfit = model_underfit.score(X_test, y_test) print(f"Overfitting model - Train accuracy: {train_score_overfit:.2f}") print(f"Overfitting model - Test accuracy: {test_score_overfit:.2f}") print(f"Underfitting model - Train accuracy: {train_score_underfit:.2f}") print(f"Underfitting model - Test accuracy: {test_score_underfit:.2f}")
Overfitting usually shows very high training accuracy but lower test accuracy.
Underfitting shows low accuracy on both training and test data.
Use techniques like cross-validation, pruning, or simpler models to avoid overfitting.
Overfitting means the model learns too much noise and details from training data.
Underfitting means the model is too simple and misses important patterns.
Good models balance learning enough to predict well on new data without memorizing training data.