How to use polynomial features sklearn in python

MlopsHow-ToBeginner · 3 min read

How to Use Polynomial Features in sklearn with Python

Use PolynomialFeatures from sklearn.preprocessing to transform your input data by adding polynomial terms. Initialize it with the desired degree, then call fit_transform() on your data to get new features including powers and interaction terms.

📐

Syntax

The main class is PolynomialFeatures from sklearn.preprocessing. You create an instance with parameters like degree to set the highest polynomial power. Then use fit_transform(X) to convert your input data X into polynomial features.

degree: The maximum power for polynomial features (default is 2).
include_bias: Whether to include a bias (constant 1) column (default is True).
interaction_only: If True, only interaction terms without powers are included.

python

from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree=2, include_bias=True, interaction_only=False)
X_poly = poly.fit_transform(X)

💻

Example

This example shows how to create polynomial features of degree 2 from a simple 2D input array. It prints the original data and the transformed data with polynomial terms.

python

from sklearn.preprocessing import PolynomialFeatures
import numpy as np

# Sample input data with 2 features
X = np.array([[2, 3], [4, 5], [6, 7]])

# Create polynomial features of degree 2
poly = PolynomialFeatures(degree=2, include_bias=True)
X_poly = poly.fit_transform(X)

print("Original data:\n", X)
print("\nPolynomial features:\n", X_poly)

Output

Original data: [[2 3] [4 5] [6 7]] Polynomial features: [[ 1. 2. 3. 4. 6. 9.] [ 1. 4. 5. 16. 20. 25.] [ 1. 6. 7. 36. 42. 49.]]

⚠️

Common Pitfalls

Forgetting to set include_bias=False if you don't want the constant 1 column, which can affect some models.
Using too high a degree can create many features and cause overfitting or slow training.
Not scaling features before polynomial expansion can lead to large value ranges and unstable models.

python

from sklearn.preprocessing import PolynomialFeatures
import numpy as np

X = np.array([[1, 2], [3, 4]])

# Wrong: forgetting to set include_bias=False when not needed
poly_wrong = PolynomialFeatures(degree=2)
X_wrong = poly_wrong.fit_transform(X)
print("With bias column (default):\n", X_wrong)

# Right: exclude bias if model adds intercept
poly_right = PolynomialFeatures(degree=2, include_bias=False)
X_right = poly_right.fit_transform(X)
print("\nWithout bias column:\n", X_right)

Output

With bias column (default): [[ 1. 1. 2. 1. 2. 4.] [ 1. 3. 4. 9. 12. 16.]] Without bias column: [[ 1. 2. 1. 2. 4.] [ 3. 4. 9. 12. 16.]]

📊

Quick Reference

Parameter	Description	Default
degree	Maximum polynomial degree	2
include_bias	Add constant 1 column	True
interaction_only	Only interaction terms, no powers	False
fit_transform(X)	Fit and transform input data X	N/A
get_feature_names_out()	Get names of generated features	N/A

✅

Key Takeaways

Use PolynomialFeatures to add polynomial and interaction terms to your data for better model fitting.

Set degree carefully to avoid too many features and overfitting.

Remember to set include_bias=False if your model already adds an intercept.

Scaling features before polynomial expansion can improve model stability.

Use fit_transform() to generate the new feature matrix from your input data.