Ridge vs Lasso Regression in Python: Key Differences and Code Examples
Ridge and Lasso regression add penalties to linear regression to reduce overfitting, but Ridge uses L2 regularization which shrinks coefficients evenly, while Lasso uses L1 regularization which can shrink some coefficients to zero, effectively selecting features.Quick Comparison
Here is a quick side-by-side comparison of Ridge and Lasso regression.
| Factor | Ridge Regression | Lasso Regression |
|---|---|---|
| Regularization Type | L2 (squares of coefficients) | L1 (absolute values of coefficients) |
| Effect on Coefficients | Shrinks coefficients but keeps all | Can shrink some coefficients to zero (feature selection) |
| Use Case | When all features matter but need shrinkage | When you want automatic feature selection |
| Solution Type | Analytic, always unique | Can produce sparse solutions, may not be unique |
| Computational Cost | Generally faster | Slightly slower due to feature selection |
| Interpretability | Coefficients are small but non-zero | Simpler model with fewer features |
Key Differences
Ridge regression adds a penalty equal to the square of the magnitude of coefficients (L2 regularization). This penalty shrinks coefficients evenly but does not set any to zero, so all features remain in the model. It is useful when you believe all features contribute but want to reduce overfitting.
Lasso regression adds a penalty equal to the absolute value of coefficients (L1 regularization). This can shrink some coefficients exactly to zero, effectively removing those features from the model. This makes Lasso useful for feature selection and producing simpler, more interpretable models.
Because of these differences, Ridge tends to perform better when many features have small/medium effects, while Lasso is better when only a few features are important. The choice depends on your data and goals.
Code Comparison
Here is how to use Ridge regression in Python with sklearn on the same dataset.
from sklearn.linear_model import Ridge from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error # Create sample data X, y = make_regression(n_samples=100, n_features=5, noise=10, random_state=42) # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create and train Ridge model ridge = Ridge(alpha=1.0) ridge.fit(X_train, y_train) # Predict and evaluate preds = ridge.predict(X_test) mse = mean_squared_error(y_test, preds) print(f"Ridge Regression MSE: {mse:.2f}") print(f"Coefficients: {ridge.coef_}")
Lasso Equivalent
Here is the equivalent code using Lasso regression on the same data.
from sklearn.linear_model import Lasso # Create and train Lasso model lasso = Lasso(alpha=1.0) lasso.fit(X_train, y_train) # Predict and evaluate preds_lasso = lasso.predict(X_test) mse_lasso = mean_squared_error(y_test, preds_lasso) print(f"Lasso Regression MSE: {mse_lasso:.2f}") print(f"Coefficients: {lasso.coef_}")
When to Use Which
Choose Ridge regression when you believe all features have some effect and you want to reduce overfitting without removing any features. It works well when many features contribute moderately.
Choose Lasso regression when you want to simplify your model by selecting only the most important features, especially if you suspect many features are irrelevant.
In practice, you can also try Elastic Net, which combines both penalties, but Ridge and Lasso cover most needs for regularized linear regression.