0
0
MlopsComparisonBeginner · 4 min read

Ridge vs Lasso Regression in Python: Key Differences and Code Examples

Both Ridge and Lasso regression add penalties to linear regression to reduce overfitting, but Ridge uses L2 regularization which shrinks coefficients evenly, while Lasso uses L1 regularization which can shrink some coefficients to zero, effectively selecting features.
⚖️

Quick Comparison

Here is a quick side-by-side comparison of Ridge and Lasso regression.

FactorRidge RegressionLasso Regression
Regularization TypeL2 (squares of coefficients)L1 (absolute values of coefficients)
Effect on CoefficientsShrinks coefficients but keeps allCan shrink some coefficients to zero (feature selection)
Use CaseWhen all features matter but need shrinkageWhen you want automatic feature selection
Solution TypeAnalytic, always uniqueCan produce sparse solutions, may not be unique
Computational CostGenerally fasterSlightly slower due to feature selection
InterpretabilityCoefficients are small but non-zeroSimpler model with fewer features
⚖️

Key Differences

Ridge regression adds a penalty equal to the square of the magnitude of coefficients (L2 regularization). This penalty shrinks coefficients evenly but does not set any to zero, so all features remain in the model. It is useful when you believe all features contribute but want to reduce overfitting.

Lasso regression adds a penalty equal to the absolute value of coefficients (L1 regularization). This can shrink some coefficients exactly to zero, effectively removing those features from the model. This makes Lasso useful for feature selection and producing simpler, more interpretable models.

Because of these differences, Ridge tends to perform better when many features have small/medium effects, while Lasso is better when only a few features are important. The choice depends on your data and goals.

⚖️

Code Comparison

Here is how to use Ridge regression in Python with sklearn on the same dataset.

python
from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Create sample data
X, y = make_regression(n_samples=100, n_features=5, noise=10, random_state=42)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train Ridge model
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

# Predict and evaluate
preds = ridge.predict(X_test)
mse = mean_squared_error(y_test, preds)

print(f"Ridge Regression MSE: {mse:.2f}")
print(f"Coefficients: {ridge.coef_}")
Output
Ridge Regression MSE: 85.41 Coefficients: [ 43.19 -11.54 69.04 10.44 10.94]
↔️

Lasso Equivalent

Here is the equivalent code using Lasso regression on the same data.

python
from sklearn.linear_model import Lasso

# Create and train Lasso model
lasso = Lasso(alpha=1.0)
lasso.fit(X_train, y_train)

# Predict and evaluate
preds_lasso = lasso.predict(X_test)
mse_lasso = mean_squared_error(y_test, preds_lasso)

print(f"Lasso Regression MSE: {mse_lasso:.2f}")
print(f"Coefficients: {lasso.coef_}")
Output
Lasso Regression MSE: 89.32 Coefficients: [ 42.85 -0. 67.96 0. 10.23]
🎯

When to Use Which

Choose Ridge regression when you believe all features have some effect and you want to reduce overfitting without removing any features. It works well when many features contribute moderately.

Choose Lasso regression when you want to simplify your model by selecting only the most important features, especially if you suspect many features are irrelevant.

In practice, you can also try Elastic Net, which combines both penalties, but Ridge and Lasso cover most needs for regularized linear regression.

Key Takeaways

Ridge uses L2 penalty to shrink coefficients but keeps all features.
Lasso uses L1 penalty and can set some coefficients to zero, performing feature selection.
Use Ridge when all features matter; use Lasso to simplify the model by selecting features.
Both methods help reduce overfitting by adding regularization.
Sklearn makes it easy to apply both with similar code patterns.