MlopsComparisonBeginner · 4 min read

Ridge vs Lasso Regression in Python: Key Differences and Code Examples

Both Ridge and Lasso regression add penalties to linear regression to reduce overfitting, but Ridge uses L2 regularization which shrinks coefficients evenly, while Lasso uses L1 regularization which can shrink some coefficients to zero, effectively selecting features.

⚖️

Quick Comparison

Here is a quick side-by-side comparison of Ridge and Lasso regression.

Factor	Ridge Regression	Lasso Regression
Regularization Type	L2 (squares of coefficients)	L1 (absolute values of coefficients)
Effect on Coefficients	Shrinks coefficients but keeps all	Can shrink some coefficients to zero (feature selection)
Use Case	When all features matter but need shrinkage	When you want automatic feature selection
Solution Type	Analytic, always unique	Can produce sparse solutions, may not be unique
Computational Cost	Generally faster	Slightly slower due to feature selection
Interpretability	Coefficients are small but non-zero	Simpler model with fewer features

⚖️

Key Differences

Ridge regression adds a penalty equal to the square of the magnitude of coefficients (L2 regularization). This penalty shrinks coefficients evenly but does not set any to zero, so all features remain in the model. It is useful when you believe all features contribute but want to reduce overfitting.

Lasso regression adds a penalty equal to the absolute value of coefficients (L1 regularization). This can shrink some coefficients exactly to zero, effectively removing those features from the model. This makes Lasso useful for feature selection and producing simpler, more interpretable models.

Because of these differences, Ridge tends to perform better when many features have small/medium effects, while Lasso is better when only a few features are important. The choice depends on your data and goals.

⚖️

Code Comparison

Here is how to use Ridge regression in Python with sklearn on the same dataset.

python

from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Create sample data
X, y = make_regression(n_samples=100, n_features=5, noise=10, random_state=42)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train Ridge model
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

# Predict and evaluate
preds = ridge.predict(X_test)
mse = mean_squared_error(y_test, preds)

print(f"Ridge Regression MSE: {mse:.2f}")
print(f"Coefficients: {ridge.coef_}")

Output

Ridge Regression MSE: 85.41 Coefficients: [ 43.19 -11.54 69.04 10.44 10.94]

↔️

Lasso Equivalent

Here is the equivalent code using Lasso regression on the same data.

python

from sklearn.linear_model import Lasso

# Create and train Lasso model
lasso = Lasso(alpha=1.0)
lasso.fit(X_train, y_train)

# Predict and evaluate
preds_lasso = lasso.predict(X_test)
mse_lasso = mean_squared_error(y_test, preds_lasso)

print(f"Lasso Regression MSE: {mse_lasso:.2f}")
print(f"Coefficients: {lasso.coef_}")

Output

Lasso Regression MSE: 89.32 Coefficients: [ 42.85 -0. 67.96 0. 10.23]

🎯

When to Use Which

Choose Ridge regression when you believe all features have some effect and you want to reduce overfitting without removing any features. It works well when many features contribute moderately.

Choose Lasso regression when you want to simplify your model by selecting only the most important features, especially if you suspect many features are irrelevant.

In practice, you can also try Elastic Net, which combines both penalties, but Ridge and Lasso cover most needs for regularized linear regression.

✅

Key Takeaways

Ridge uses L2 penalty to shrink coefficients but keeps all features.

Lasso uses L1 penalty and can set some coefficients to zero, performing feature selection.

Use Ridge when all features matter; use Lasso to simplify the model by selecting features.

Both methods help reduce overfitting by adding regularization.

Sklearn makes it easy to apply both with similar code patterns.