What is regularization in machine learning in python

MlopsConceptBeginner · 3 min read

What is Regularization in Machine Learning with Python

In machine learning, regularization is a technique to reduce overfitting by adding a penalty to the model's complexity. In Python, libraries like sklearn provide regularization methods such as L1 (Lasso) and L2 (Ridge) to help models generalize better on new data.

⚙️

How It Works

Regularization works like a gentle guide that keeps a machine learning model from becoming too complex. Imagine you are trying to fit a curve to some points on a graph. Without regularization, the curve might twist and turn too much to perfectly hit every point, which can cause it to perform poorly on new points.

Regularization adds a small cost for complexity, like a penalty for making the curve too wiggly. This encourages the model to find a simpler, smoother curve that fits the data well but also works better on new, unseen data. In Python's sklearn, this is done by adding terms to the loss function that shrink the model's coefficients.

💻

Example

This example shows how to use L2 regularization (Ridge) with sklearn to train a linear regression model that avoids overfitting.

python

from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error

# Create sample data with noise
X, y = make_regression(n_samples=100, n_features=1, noise=20, random_state=42)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Ridge regression model with regularization strength alpha=1.0
model = Ridge(alpha=1.0)
model.fit(X_train, y_train)

# Predict on test data
predictions = model.predict(X_test)

# Calculate mean squared error
mse = mean_squared_error(y_test, predictions)

print(f"Mean Squared Error with Ridge regularization: {mse:.2f}")

Output

Mean Squared Error with Ridge regularization: 349.88

🎯

When to Use

Use regularization when your model fits the training data too well but performs poorly on new data, a problem called overfitting. This often happens when you have many features or a small dataset.

Regularization is helpful in real-world cases like predicting house prices, where you want your model to generalize well to new houses, or in medical diagnosis, where avoiding false patterns is critical.

✅

Key Points

Regularization adds a penalty to model complexity to prevent overfitting.
L1 (Lasso) and L2 (Ridge) are common regularization types in sklearn.
It helps models generalize better on new, unseen data.
Choosing the right regularization strength (alpha) is important for best results.

✅

Key Takeaways

Regularization reduces overfitting by penalizing complex models.

Use sklearn's Ridge or Lasso for easy regularization in Python.

Regularization improves model performance on new data.

Adjust the regularization strength (alpha) to balance fit and simplicity.