0
0
Pandasdata~5 mins

Multiple aggregation functions in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Multiple aggregation functions
O(n)
Understanding Time Complexity

We want to understand how the time needed to run multiple aggregation functions on data grows as the data gets bigger.

How does adding more data affect the work pandas does when summarizing it?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

df = pd.DataFrame({
    'group': ['A', 'B', 'A', 'B', 'A', 'B'],
    'value': [10, 20, 30, 40, 50, 60]
})

result = df.groupby('group').agg({'value': ['sum', 'mean', 'max']})

This code groups data by a column and calculates sum, mean, and max for each group.

Identify Repeating Operations
  • Primary operation: pandas loops internally over each group to compute each aggregation.
  • How many times: For each group, it processes all rows once per aggregation function.
How Execution Grows With Input

As the number of rows grows, pandas must look at more data for each aggregation.

Input Size (n)Approx. Operations
10About 30 (10 rows x 3 functions)
100About 300 (100 rows x 3 functions)
1000About 3000 (1000 rows x 3 functions)

Pattern observation: The work grows roughly in direct proportion to the number of rows and the number of aggregation functions.

Final Time Complexity

Time Complexity: O(n)

This means the time to run these aggregations grows linearly with the number of rows in the data.

Common Mistake

[X] Wrong: "Adding more aggregation functions multiplies the time by the square of the data size."

[OK] Correct: Each aggregation looks at the data once, so time grows with data size times number of functions, not squared.

Interview Connect

Knowing how aggregation time grows helps you explain performance when working with grouped data, a common task in data analysis.

Self-Check

"What if we added more grouping columns? How would the time complexity change?"