How to Use Polynomial Regression with sklearn in Python
Use
PolynomialFeatures from sklearn.preprocessing to transform your input data to polynomial features, then fit a linear model like LinearRegression on this transformed data. This approach lets you model nonlinear relationships easily with sklearn.Syntax
Polynomial regression in sklearn involves two main steps: first, transform your input features to polynomial features using PolynomialFeatures, then fit a linear regression model on these features.
PolynomialFeatures(degree=n): creates new features by raising original features to powers up ton.fit_transform(X): applies the polynomial transformation to your dataX.LinearRegression(): fits a linear model to the transformed features.
python
from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression poly = PolynomialFeatures(degree=2) X_poly = poly.fit_transform(X) # X is your input data model = LinearRegression() model.fit(X_poly, y) # y is your target variable
Example
This example shows how to fit a polynomial regression model of degree 3 to sample data and predict new values.
python
import numpy as np from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression # Sample data: X is input, y is output X = np.array([[1], [2], [3], [4], [5]]) y = np.array([1, 8, 27, 64, 125]) # y = x^3 # Create polynomial features of degree 3 poly = PolynomialFeatures(degree=3) X_poly = poly.fit_transform(X) # Fit linear regression on transformed features model = LinearRegression() model.fit(X_poly, y) # Predict on new data X_new = np.array([[6], [7]]) X_new_poly = poly.transform(X_new) predictions = model.predict(X_new_poly) print("Predictions for 6 and 7:", predictions)
Output
Predictions for 6 and 7: [ 216. 343.]
Common Pitfalls
- Not transforming input data with
PolynomialFeaturesbefore fitting the model causes errors or poor results. - Using too high a degree can cause overfitting, where the model fits noise instead of the true pattern.
- For prediction, always transform new input data with the same
PolynomialFeaturesinstance used during training.
python
from sklearn.linear_model import LinearRegression # Wrong: fitting linear model directly on original data for polynomial relationship model = LinearRegression() model.fit(X, y) # This will not capture polynomial pattern # Right: transform data first from sklearn.preprocessing import PolynomialFeatures poly = PolynomialFeatures(degree=3) X_poly = poly.fit_transform(X) model.fit(X_poly, y)
Quick Reference
Remember these steps for polynomial regression with sklearn:
- Import
PolynomialFeaturesandLinearRegression. - Transform input data with
PolynomialFeatures(degree=n). - Fit
LinearRegressionon transformed data. - Transform new data before prediction.
Key Takeaways
Use PolynomialFeatures to create polynomial input features before fitting a linear model.
Always transform new input data with the same PolynomialFeatures instance before prediction.
Avoid very high polynomial degrees to prevent overfitting.
Polynomial regression with sklearn is just linear regression on transformed features.
Check your model predictions to ensure the polynomial fit matches your data pattern.