0
0
Data Analysis Pythondata~5 mins

Creating interaction features in Data Analysis Python - Performance & Efficiency

Choose your learning style9 modes available
Time Complexity: Creating interaction features
O(n x m²)
Understanding Time Complexity

When we create interaction features, we combine columns to capture relationships.

We want to know how the time to create these features grows as data size grows.

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


import pandas as pd

def create_interactions(df):
    cols = df.columns
    for i in range(len(cols)):
        for j in range(i+1, len(cols)):
            df[f'{cols[i]}_x_{cols[j]}'] = df[cols[i]] * df[cols[j]]
    return df

This code creates new features by multiplying every pair of columns in the DataFrame.

Identify Repeating Operations
  • Primary operation: Nested loops over columns to create pairwise products.
  • How many times: For each pair of columns, one multiplication per row.
How Execution Grows With Input

As the number of columns grows, the pairs grow roughly like the square of columns.

Input Size (columns)Approx. Operations (multiplications)
1045 x rows
1004,950 x rows
1000499,500 x rows

Pattern observation: The number of pairs grows quickly as columns increase, so work grows roughly with the square of columns times rows.

Final Time Complexity

Time Complexity: O(n x m²)

This means the time grows linearly with the number of rows and quadratically with the number of columns.

Common Mistake

[X] Wrong: "Creating interaction features only depends on the number of rows."

[OK] Correct: Because the number of column pairs grows with the square of columns, the columns count heavily affects time.

Interview Connect

Understanding how feature creation scales helps you explain your data preparation choices clearly and confidently.

Self-Check

"What if we only created interaction features for a selected subset of columns? How would the time complexity change?"