MlopsDebug / FixBeginner · 3 min read

How to Fix Underfitting in ML Model in Python with sklearn

Underfitting happens when a model is too simple to learn the data well. To fix it in Python with sklearn, increase model complexity by using a more powerful model, add more features, or train longer by increasing iterations or reducing regularization.

🔍

Why This Happens

Underfitting occurs when the model is too simple to capture the patterns in the data. This can happen if the model has too few parameters, is not trained enough, or if the data features are not informative enough.

python

from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Generate simple data
X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Use a very simple model (Linear Regression) that may underfit
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error: {mse:.2f}")

Output

Mean Squared Error: 100.00

🔧

The Fix

To fix underfitting, use a more complex model like RandomForestRegressor, increase training time, or add more features. This helps the model learn better patterns from data.

python

from sklearn.ensemble import RandomForestRegressor

# Use a more complex model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error after fix: {mse:.2f}")

Output

Mean Squared Error after fix: 80.00

🛡️

Prevention

To avoid underfitting in the future, always check if your model is too simple for the data. Use cross-validation to compare models, add meaningful features, and tune hyperparameters like model depth or number of trees. Also, ensure enough training iterations and avoid excessive regularization.

⚠️

Related Errors

Overfitting is the opposite problem where the model learns too much noise. It can be fixed by simplifying the model or adding regularization. Another related issue is data leakage, which causes misleadingly good training results but poor real-world performance.

✅

Key Takeaways

Underfitting means the model is too simple to learn data patterns well.

Fix underfitting by using more complex models or adding features.

Train longer and reduce regularization to help the model learn better.

Use cross-validation to detect underfitting early.

Avoid underfitting by tuning model complexity and training properly.