How to use polynomial regression sklearn in python

MlopsHow-ToBeginner · 3 min read

How to Use Polynomial Regression with sklearn in Python

Use PolynomialFeatures from sklearn.preprocessing to transform your input data to polynomial features, then fit a linear model like LinearRegression on this transformed data. This approach lets you model nonlinear relationships easily with sklearn.

📐

Syntax

Polynomial regression in sklearn involves two main steps: first, transform your input features to polynomial features using PolynomialFeatures, then fit a linear regression model on these features.

PolynomialFeatures(degree=n): creates new features by raising original features to powers up to n.
fit_transform(X): applies the polynomial transformation to your data X.
LinearRegression(): fits a linear model to the transformed features.

python

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)  # X is your input data

model = LinearRegression()
model.fit(X_poly, y)  # y is your target variable

💻

Example

This example shows how to fit a polynomial regression model of degree 3 to sample data and predict new values.

python

import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Sample data: X is input, y is output
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 8, 27, 64, 125])  # y = x^3

# Create polynomial features of degree 3
poly = PolynomialFeatures(degree=3)
X_poly = poly.fit_transform(X)

# Fit linear regression on transformed features
model = LinearRegression()
model.fit(X_poly, y)

# Predict on new data
X_new = np.array([[6], [7]])
X_new_poly = poly.transform(X_new)
predictions = model.predict(X_new_poly)

print("Predictions for 6 and 7:", predictions)

Output

Predictions for 6 and 7: [ 216. 343.]

⚠️

Common Pitfalls

Not transforming input data with PolynomialFeatures before fitting the model causes errors or poor results.
Using too high a degree can cause overfitting, where the model fits noise instead of the true pattern.
For prediction, always transform new input data with the same PolynomialFeatures instance used during training.

python

from sklearn.linear_model import LinearRegression

# Wrong: fitting linear model directly on original data for polynomial relationship
model = LinearRegression()
model.fit(X, y)  # This will not capture polynomial pattern

# Right: transform data first
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=3)
X_poly = poly.fit_transform(X)
model.fit(X_poly, y)

📊

Quick Reference

Remember these steps for polynomial regression with sklearn:

Import PolynomialFeatures and LinearRegression.
Transform input data with PolynomialFeatures(degree=n).
Fit LinearRegression on transformed data.
Transform new data before prediction.

✅

Key Takeaways

Use PolynomialFeatures to create polynomial input features before fitting a linear model.

Always transform new input data with the same PolynomialFeatures instance before prediction.

Avoid very high polynomial degrees to prevent overfitting.

Polynomial regression with sklearn is just linear regression on transformed features.

Check your model predictions to ensure the polynomial fit matches your data pattern.