0
0
MlopsConceptBeginner · 3 min read

What is Underfitting in Machine Learning in Python | Sklearn Guide

In machine learning, underfitting happens when a model is too simple to learn the patterns in the training data well, resulting in poor performance on both training and new data. It means the model cannot capture the underlying trend, often due to low complexity or insufficient training.
⚙️

How It Works

Imagine trying to fit a straight line through points that actually follow a curve. If you only use a simple line, it won't match the points well. This is like underfitting in machine learning, where the model is too simple to understand the data's true pattern.

Underfitting happens when the model does not learn enough from the data. It might be because the model is too basic, like using a small decision tree or a linear model for complex data. This leads to high errors on both the data it trained on and new data it sees.

💻

Example

This example shows underfitting by using a very simple linear model on data that follows a curve. The model cannot capture the curve well, so it performs poorly.

python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Create curved data
X = np.linspace(0, 10, 100).reshape(-1, 1)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, 100)

# Fit a simple linear model
model = LinearRegression()
model.fit(X, y)

# Predict and calculate error
predictions = model.predict(X)
mse = mean_squared_error(y, predictions)

# Plot data and model prediction
plt.scatter(X, y, label='Data')
plt.plot(X, predictions, color='red', label='Linear Model')
plt.legend()
plt.title(f'Underfitting Example: MSE={mse:.3f}')
plt.show()

print(f'Mean Squared Error: {mse:.3f}')
Output
Mean Squared Error: 0.520
🎯

When to Use

Understanding underfitting helps you know when your model is too simple and needs improvement. You want to avoid underfitting when your model performs poorly on training data, showing it can't learn the patterns well.

In real life, if you try to predict house prices with a very simple model ignoring important features, you get underfitting. Use more complex models or add features to fix it. Underfitting is common when starting with basic models or when data is complex.

Key Points

  • Underfitting means the model is too simple to learn data patterns.
  • It causes poor performance on both training and new data.
  • Common causes: low model complexity, too few features, or insufficient training.
  • Fix by increasing model complexity or adding more relevant data/features.

Key Takeaways

Underfitting occurs when a model is too simple to capture data patterns.
It results in high errors on both training and test data.
Use more complex models or add features to reduce underfitting.
Check training performance to detect underfitting early.
Balancing model complexity helps avoid both underfitting and overfitting.