Creating interaction features in Data Analysis Python - Performance & Efficiency
When we create interaction features, we combine columns to capture relationships.
We want to know how the time to create these features grows as data size grows.
Analyze the time complexity of the following code snippet.
import pandas as pd
def create_interactions(df):
cols = df.columns
for i in range(len(cols)):
for j in range(i+1, len(cols)):
df[f'{cols[i]}_x_{cols[j]}'] = df[cols[i]] * df[cols[j]]
return df
This code creates new features by multiplying every pair of columns in the DataFrame.
- Primary operation: Nested loops over columns to create pairwise products.
- How many times: For each pair of columns, one multiplication per row.
As the number of columns grows, the pairs grow roughly like the square of columns.
| Input Size (columns) | Approx. Operations (multiplications) |
|---|---|
| 10 | 45 x rows |
| 100 | 4,950 x rows |
| 1000 | 499,500 x rows |
Pattern observation: The number of pairs grows quickly as columns increase, so work grows roughly with the square of columns times rows.
Time Complexity: O(n x m²)
This means the time grows linearly with the number of rows and quadratically with the number of columns.
[X] Wrong: "Creating interaction features only depends on the number of rows."
[OK] Correct: Because the number of column pairs grows with the square of columns, the columns count heavily affects time.
Understanding how feature creation scales helps you explain your data preparation choices clearly and confidently.
"What if we only created interaction features for a selected subset of columns? How would the time complexity change?"