0
0
MlopsConceptBeginner · 3 min read

Ridge Regression in Python: What It Is and How to Use It

Ridge regression in Python is a type of linear regression that adds a penalty to reduce model complexity and prevent overfitting, implemented with Ridge from sklearn.linear_model. It works by shrinking the coefficients towards zero but not exactly zero, helping when features are many or correlated.
⚙️

How It Works

Ridge regression is like trying to fit a line to data but with a rule that says: "Don't make the line too wild or complicated." Imagine you are drawing a path through points on a map, but you want the path to be smooth and not jump around too much. Ridge regression adds a small cost for making the path too curvy.

Technically, it adds a penalty to the size of the coefficients (the numbers that multiply each feature) in the model. This penalty is called L2 regularization. It helps when features are related or when there are many features, so the model doesn't just memorize the training data but learns a simpler pattern that works better on new data.

💻

Example

This example shows how to use ridge regression with sklearn to fit a model and check its coefficients and score.

python
from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Create sample data with noise
X, y = make_regression(n_samples=100, n_features=2, noise=10, random_state=42)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Create Ridge regression model with alpha=1.0 (penalty strength)
model = Ridge(alpha=1.0)

# Train the model
model.fit(X_train, y_train)

# Predict on test data
predictions = model.predict(X_test)

# Print coefficients and model score
print(f"Coefficients: {model.coef_}")
print(f"Model R^2 score on test data: {model.score(X_test, y_test):.2f}")
Output
Coefficients: [39.75697391 72.09192752] Model R^2 score on test data: 0.96
🎯

When to Use

Use ridge regression when you have many features or when features are correlated, which can confuse simple linear regression. It helps by keeping the model simpler and more stable.

For example, in predicting house prices, if many features like size, number of rooms, and location are related, ridge regression can prevent the model from relying too much on any one feature. It is also useful when you want to reduce overfitting but still keep all features in the model.

Key Points

  • Ridge regression adds L2 penalty to shrink coefficients.
  • It reduces overfitting by keeping the model simpler.
  • Useful when features are many or correlated.
  • Implemented in Python with sklearn.linear_model.Ridge.
  • Alpha controls the strength of the penalty; higher alpha means more shrinkage.

Key Takeaways

Ridge regression adds a penalty to reduce model complexity and prevent overfitting.
It is best used when features are many or correlated to stabilize the model.
In Python, use sklearn's Ridge class to easily apply ridge regression.
The alpha parameter controls how much the coefficients are shrunk.
Ridge regression keeps all features but reduces their impact to improve generalization.