How to Use Polynomial Features in sklearn with Python
Use
PolynomialFeatures from sklearn.preprocessing to transform your input data by adding polynomial terms. Initialize it with the desired degree, then call fit_transform() on your data to get new features including powers and interaction terms.Syntax
The main class is PolynomialFeatures from sklearn.preprocessing. You create an instance with parameters like degree to set the highest polynomial power. Then use fit_transform(X) to convert your input data X into polynomial features.
- degree: The maximum power for polynomial features (default is 2).
- include_bias: Whether to include a bias (constant 1) column (default is True).
- interaction_only: If True, only interaction terms without powers are included.
python
from sklearn.preprocessing import PolynomialFeatures poly = PolynomialFeatures(degree=2, include_bias=True, interaction_only=False) X_poly = poly.fit_transform(X)
Example
This example shows how to create polynomial features of degree 2 from a simple 2D input array. It prints the original data and the transformed data with polynomial terms.
python
from sklearn.preprocessing import PolynomialFeatures import numpy as np # Sample input data with 2 features X = np.array([[2, 3], [4, 5], [6, 7]]) # Create polynomial features of degree 2 poly = PolynomialFeatures(degree=2, include_bias=True) X_poly = poly.fit_transform(X) print("Original data:\n", X) print("\nPolynomial features:\n", X_poly)
Output
Original data:
[[2 3]
[4 5]
[6 7]]
Polynomial features:
[[ 1. 2. 3. 4. 6. 9.]
[ 1. 4. 5. 16. 20. 25.]
[ 1. 6. 7. 36. 42. 49.]]
Common Pitfalls
- Forgetting to set
include_bias=Falseif you don't want the constant 1 column, which can affect some models. - Using too high a
degreecan create many features and cause overfitting or slow training. - Not scaling features before polynomial expansion can lead to large value ranges and unstable models.
python
from sklearn.preprocessing import PolynomialFeatures import numpy as np X = np.array([[1, 2], [3, 4]]) # Wrong: forgetting to set include_bias=False when not needed poly_wrong = PolynomialFeatures(degree=2) X_wrong = poly_wrong.fit_transform(X) print("With bias column (default):\n", X_wrong) # Right: exclude bias if model adds intercept poly_right = PolynomialFeatures(degree=2, include_bias=False) X_right = poly_right.fit_transform(X) print("\nWithout bias column:\n", X_right)
Output
With bias column (default):
[[ 1. 1. 2. 1. 2. 4.]
[ 1. 3. 4. 9. 12. 16.]]
Without bias column:
[[ 1. 2. 1. 2. 4.]
[ 3. 4. 9. 12. 16.]]
Quick Reference
| Parameter | Description | Default |
|---|---|---|
| degree | Maximum polynomial degree | 2 |
| include_bias | Add constant 1 column | True |
| interaction_only | Only interaction terms, no powers | False |
| fit_transform(X) | Fit and transform input data X | N/A |
| get_feature_names_out() | Get names of generated features | N/A |
Key Takeaways
Use PolynomialFeatures to add polynomial and interaction terms to your data for better model fitting.
Set degree carefully to avoid too many features and overfitting.
Remember to set include_bias=False if your model already adds an intercept.
Scaling features before polynomial expansion can improve model stability.
Use fit_transform() to generate the new feature matrix from your input data.