How to use ridge regression sklearn in python

MlopsHow-ToBeginner · 3 min read

How to Use Ridge Regression with sklearn in Python

Use Ridge from sklearn.linear_model to create a ridge regression model by setting the regularization strength with alpha. Fit the model with fit(X, y) and predict new values using predict(X_new).

📐

Syntax

The basic syntax to use Ridge Regression in sklearn is:

Ridge(alpha=1.0, fit_intercept=True, solver='auto'): Creates the ridge regression model.
alpha: Controls regularization strength; higher values mean more regularization.
fit_intercept: Whether to calculate the intercept for this model.
solver: Algorithm to use for optimization.
fit(X, y): Fits the model to training data.
predict(X_new): Predicts target values for new data.

python

from sklearn.linear_model import Ridge

# Create Ridge regression model
model = Ridge(alpha=1.0, fit_intercept=True, solver='auto')

# Fit model to data
model.fit(X, y)

# Predict new values
predictions = model.predict(X_new)

💻

Example

This example shows how to create a Ridge regression model, fit it on sample data, and predict new values.

python

from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Generate sample data
X, y = make_regression(n_samples=100, n_features=2, noise=10, random_state=42)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create Ridge regression model with alpha=1.0
model = Ridge(alpha=1.0)

# Fit model on training data
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Calculate metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")
print(f"R2 Score: {r2:.2f}")

Output

Mean Squared Error: 84.62 R2 Score: 0.87

⚠️

Common Pitfalls

Not scaling features before using Ridge can lead to poor results because regularization depends on feature scale.
Setting alpha too high can cause underfitting; too low can cause overfitting.
Forcing fit_intercept=False when data is not centered can bias the model.
Using Ridge for classification tasks instead of regression is incorrect.

python

from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler

# Wrong: No feature scaling
model_wrong = Ridge(alpha=10)
model_wrong.fit(X_train, y_train)

# Right: Scale features before fitting
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_right = Ridge(alpha=10)
model_right.fit(X_train_scaled, y_train)

# Predict with scaled test data
pred_wrong = model_wrong.predict(X_test)
pred_right = model_right.predict(X_test_scaled)

📊

Quick Reference

Ridge Regression Quick Tips:

Use alpha to control regularization strength.
Always scale features for best results.
Use fit_intercept=True unless data is already centered.
Check model performance with metrics like MSE and R².
Use Ridge for regression problems only.

✅

Key Takeaways

Use sklearn.linear_model.Ridge with alpha to apply ridge regression in Python.

Always scale your features before fitting Ridge regression for better results.

Tune alpha to balance between underfitting and overfitting.

Use fit_intercept=True unless your data is already centered.

Evaluate model performance using metrics like mean squared error and R² score.