How to Fix Underfitting in ML Model in Python with sklearn
sklearn, increase model complexity by using a more powerful model, add more features, or train longer by increasing iterations or reducing regularization.Why This Happens
Underfitting occurs when the model is too simple to capture the patterns in the data. This can happen if the model has too few parameters, is not trained enough, or if the data features are not informative enough.
from sklearn.linear_model import LinearRegression from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error # Generate simple data X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42) # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Use a very simple model (Linear Regression) that may underfit model = LinearRegression() model.fit(X_train, y_train) predictions = model.predict(X_test) mse = mean_squared_error(y_test, predictions) print(f"Mean Squared Error: {mse:.2f}")
The Fix
To fix underfitting, use a more complex model like RandomForestRegressor, increase training time, or add more features. This helps the model learn better patterns from data.
from sklearn.ensemble import RandomForestRegressor # Use a more complex model model = RandomForestRegressor(n_estimators=100, random_state=42) model.fit(X_train, y_train) predictions = model.predict(X_test) mse = mean_squared_error(y_test, predictions) print(f"Mean Squared Error after fix: {mse:.2f}")
Prevention
To avoid underfitting in the future, always check if your model is too simple for the data. Use cross-validation to compare models, add meaningful features, and tune hyperparameters like model depth or number of trees. Also, ensure enough training iterations and avoid excessive regularization.
Related Errors
Overfitting is the opposite problem where the model learns too much noise. It can be fixed by simplifying the model or adding regularization. Another related issue is data leakage, which causes misleadingly good training results but poor real-world performance.