0
0
MlopsHow-ToBeginner · 3 min read

How to Use Polynomial Features in sklearn with Python

Use PolynomialFeatures from sklearn.preprocessing to transform your input data by adding polynomial terms. Initialize it with the desired degree, then call fit_transform() on your data to get new features including powers and interaction terms.
📐

Syntax

The main class is PolynomialFeatures from sklearn.preprocessing. You create an instance with parameters like degree to set the highest polynomial power. Then use fit_transform(X) to convert your input data X into polynomial features.

  • degree: The maximum power for polynomial features (default is 2).
  • include_bias: Whether to include a bias (constant 1) column (default is True).
  • interaction_only: If True, only interaction terms without powers are included.
python
from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree=2, include_bias=True, interaction_only=False)
X_poly = poly.fit_transform(X)
💻

Example

This example shows how to create polynomial features of degree 2 from a simple 2D input array. It prints the original data and the transformed data with polynomial terms.

python
from sklearn.preprocessing import PolynomialFeatures
import numpy as np

# Sample input data with 2 features
X = np.array([[2, 3], [4, 5], [6, 7]])

# Create polynomial features of degree 2
poly = PolynomialFeatures(degree=2, include_bias=True)
X_poly = poly.fit_transform(X)

print("Original data:\n", X)
print("\nPolynomial features:\n", X_poly)
Output
Original data: [[2 3] [4 5] [6 7]] Polynomial features: [[ 1. 2. 3. 4. 6. 9.] [ 1. 4. 5. 16. 20. 25.] [ 1. 6. 7. 36. 42. 49.]]
⚠️

Common Pitfalls

  • Forgetting to set include_bias=False if you don't want the constant 1 column, which can affect some models.
  • Using too high a degree can create many features and cause overfitting or slow training.
  • Not scaling features before polynomial expansion can lead to large value ranges and unstable models.
python
from sklearn.preprocessing import PolynomialFeatures
import numpy as np

X = np.array([[1, 2], [3, 4]])

# Wrong: forgetting to set include_bias=False when not needed
poly_wrong = PolynomialFeatures(degree=2)
X_wrong = poly_wrong.fit_transform(X)
print("With bias column (default):\n", X_wrong)

# Right: exclude bias if model adds intercept
poly_right = PolynomialFeatures(degree=2, include_bias=False)
X_right = poly_right.fit_transform(X)
print("\nWithout bias column:\n", X_right)
Output
With bias column (default): [[ 1. 1. 2. 1. 2. 4.] [ 1. 3. 4. 9. 12. 16.]] Without bias column: [[ 1. 2. 1. 2. 4.] [ 3. 4. 9. 12. 16.]]
📊

Quick Reference

ParameterDescriptionDefault
degreeMaximum polynomial degree2
include_biasAdd constant 1 columnTrue
interaction_onlyOnly interaction terms, no powersFalse
fit_transform(X)Fit and transform input data XN/A
get_feature_names_out()Get names of generated featuresN/A

Key Takeaways

Use PolynomialFeatures to add polynomial and interaction terms to your data for better model fitting.
Set degree carefully to avoid too many features and overfitting.
Remember to set include_bias=False if your model already adds an intercept.
Scaling features before polynomial expansion can improve model stability.
Use fit_transform() to generate the new feature matrix from your input data.