0
0
Pandasdata~5 mins

Split-apply-combine mental model in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Split-apply-combine mental model
O(n)
Understanding Time Complexity

When using pandas to group data and then apply calculations, it is important to understand how the time needed grows as the data gets bigger.

We want to know how the time changes when we split data into groups, do work on each group, and then combine the results.

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

df = pd.DataFrame({
    'Category': ['A', 'B', 'A', 'B', 'C', 'A'],
    'Value': [10, 20, 30, 40, 50, 60]
})

grouped = df.groupby('Category')
result = grouped['Value'].sum()

This code splits the data by 'Category', sums the 'Value' in each group, and combines the sums into a result.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Going through each row to assign it to a group.
  • How many times: Once for each row in the data (n times).
  • Secondary operation: Summing values inside each group, which depends on group size.
How Execution Grows With Input

As the number of rows grows, the time to split and sum grows roughly in direct proportion.

Input Size (n)Approx. Operations
10About 10 steps to assign groups + sum within groups
100About 100 steps to assign groups + sum within groups
1000About 1000 steps to assign groups + sum within groups

Pattern observation: The total work grows roughly in a straight line as the data size increases.

Final Time Complexity

Time Complexity: O(n)

This means the time needed grows roughly in direct proportion to the number of rows in the data.

Common Mistake

[X] Wrong: "Grouping data is instant and does not depend on data size."

[OK] Correct: Grouping requires looking at every row to decide its group, so it takes more time as data grows.

Interview Connect

Understanding how grouping and applying functions scale helps you explain your code choices clearly and shows you know how data size affects performance.

Self-Check

"What if we applied a more complex function instead of sum, like sorting each group? How would the time complexity change?"