Pivot with aggregation functions in Pandas - Time & Space Complexity
We want to understand how the time needed to create a pivot table with aggregation changes as the data grows.
How does the work increase when we have more rows or categories?
Analyze the time complexity of the following code snippet.
import pandas as pd
df = pd.DataFrame({
'Category': ['A', 'B', 'A', 'B', 'C'],
'Value': [10, 20, 30, 40, 50]
})
pivot = df.pivot_table(index='Category', values='Value', aggfunc='sum')
This code creates a pivot table that sums values for each category.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Grouping rows by category and summing values.
- How many times: Each row is visited once to assign it to a group, then each group is aggregated.
As the number of rows grows, the time to group and sum grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 visits to rows and sums |
| 100 | About 100 visits to rows and sums |
| 1000 | About 1000 visits to rows and sums |
Pattern observation: The work grows roughly in a straight line as data size increases.
Time Complexity: O(n)
This means the time to create the pivot grows directly with the number of rows.
[X] Wrong: "Pivot tables take the same time no matter how many rows there are."
[OK] Correct: The pivot must look at each row to group and sum, so more rows mean more work.
Understanding how grouping and aggregation scale helps you explain data processing steps clearly and confidently.
"What if we added multiple aggregation functions instead of just one? How would the time complexity change?"