What is Underfitting in Machine Learning in Python | Sklearn Guide
underfitting happens when a model is too simple to learn the patterns in the training data well, resulting in poor performance on both training and new data. It means the model cannot capture the underlying trend, often due to low complexity or insufficient training.How It Works
Imagine trying to fit a straight line through points that actually follow a curve. If you only use a simple line, it won't match the points well. This is like underfitting in machine learning, where the model is too simple to understand the data's true pattern.
Underfitting happens when the model does not learn enough from the data. It might be because the model is too basic, like using a small decision tree or a linear model for complex data. This leads to high errors on both the data it trained on and new data it sees.
Example
This example shows underfitting by using a very simple linear model on data that follows a curve. The model cannot capture the curve well, so it performs poorly.
import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error # Create curved data X = np.linspace(0, 10, 100).reshape(-1, 1) y = np.sin(X).ravel() + np.random.normal(0, 0.1, 100) # Fit a simple linear model model = LinearRegression() model.fit(X, y) # Predict and calculate error predictions = model.predict(X) mse = mean_squared_error(y, predictions) # Plot data and model prediction plt.scatter(X, y, label='Data') plt.plot(X, predictions, color='red', label='Linear Model') plt.legend() plt.title(f'Underfitting Example: MSE={mse:.3f}') plt.show() print(f'Mean Squared Error: {mse:.3f}')
When to Use
Understanding underfitting helps you know when your model is too simple and needs improvement. You want to avoid underfitting when your model performs poorly on training data, showing it can't learn the patterns well.
In real life, if you try to predict house prices with a very simple model ignoring important features, you get underfitting. Use more complex models or add features to fix it. Underfitting is common when starting with basic models or when data is complex.
Key Points
- Underfitting means the model is too simple to learn data patterns.
- It causes poor performance on both training and new data.
- Common causes: low model complexity, too few features, or insufficient training.
- Fix by increasing model complexity or adding more relevant data/features.