Polynomial features in Data Analysis Python - Time & Space Complexity
We want to understand how the time to create polynomial features grows as the input data size and feature count increase.
How does the work change when we add more data or more features?
Analyze the time complexity of the following code snippet.
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(X)
This code creates new features by combining original features up to degree 2 for each data point.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Generating all combinations of features up to degree 2 for each data point.
- How many times: For each of the n data points, the code creates about m + m(m+1)/2 new features, where m is the number of original features.
As the number of data points (n) grows, the work grows linearly because each point is processed once. But as the number of features (m) grows, the number of new features grows roughly with the square of m.
| Input Size (n) | Approx. Operations (for fixed m=10) |
|---|---|
| 10 | ~10 * (10 + 55) = 650 |
| 100 | ~100 * 65 = 6,500 |
| 1000 | ~1000 * 65 = 65,000 |
Pattern observation: Operations grow linearly with data points but quadratically with features.
Time Complexity: O(n * m^2)
This means the time grows linearly with the number of data points and roughly with the square of the number of features.
[X] Wrong: "The time only depends on the number of data points, not the number of features."
[OK] Correct: Because polynomial features combine original features, the number of new features grows quickly with more original features, increasing the work.
Understanding how feature transformations affect time helps you explain trade-offs clearly and shows you can think about scaling data processing steps.
"What if we increased the polynomial degree from 2 to 3? How would the time complexity change?"