0
0
Pandasdata~5 mins

Feature engineering basics in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Feature engineering basics
O(n)
Understanding Time Complexity

When creating new features from data, it is important to know how the time to do this grows as the data gets bigger.

We want to understand how the work needed changes when we add more rows or columns.

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

n = 10  # Example value for n

df = pd.DataFrame({
    'A': range(1, n+1),
    'B': range(n, 0, -1)
})

df['C'] = df['A'] + df['B']
df['D'] = df['A'] * 2

This code creates two new columns by adding and multiplying existing columns for each row.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Adding and multiplying values for each row in the DataFrame.
  • How many times: Once per row, so n times where n is the number of rows.
How Execution Grows With Input

As the number of rows grows, the number of operations grows roughly the same amount.

Input Size (n)Approx. Operations
10About 20 (2 operations per row)
100About 200
1000About 2000

Pattern observation: The work grows directly with the number of rows.

Final Time Complexity

Time Complexity: O(n)

This means the time to create new features grows in a straight line as the data size grows.

Common Mistake

[X] Wrong: "Creating new features takes the same time no matter how big the data is."

[OK] Correct: Each row needs to be processed, so more rows mean more work and more time.

Interview Connect

Understanding how feature creation scales helps you explain your data preparation steps clearly and shows you know how to handle bigger datasets.

Self-Check

"What if we created new features using pairs of rows instead of single rows? How would the time complexity change?"