0
0
Data Analysis Pythondata~5 mins

agg() for multiple aggregations in Data Analysis Python - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: agg() for multiple aggregations
O(n * k)
Understanding Time Complexity

When using agg() to perform multiple calculations on data, it is important to understand how the time needed grows as the data gets bigger.

We want to know how the work changes when we ask for many summaries at once.

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

data = pd.DataFrame({
    'group': ['A', 'B', 'A', 'B', 'A'],
    'value': [10, 20, 30, 40, 50]
})

result = data.groupby('group').agg({
    'value': ['sum', 'mean', 'max']
})

This code groups data by a column and calculates sum, mean, and max for each group.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: The code loops over each group and then over each aggregation function.
  • How many times: For each group, it processes all rows once, then applies each aggregation function.
How Execution Grows With Input

As the number of rows grows, the code must look at more data for each group. Also, more aggregation functions mean more work per group.

Input Size (n)Approx. Operations
10About 10 times number of aggregations
100About 100 times number of aggregations
1000About 1000 times number of aggregations

Pattern observation: The work grows roughly in direct proportion to the number of rows and the number of aggregation functions.

Final Time Complexity

Time Complexity: O(n * k)

This means the time grows linearly with the number of rows n and the number of aggregation functions k.

Common Mistake

[X] Wrong: "Adding more aggregation functions does not affect the time much because they run together."

[OK] Correct: Each aggregation function needs to process the data separately, so more functions mean more work and more time.

Interview Connect

Understanding how multiple aggregations affect performance helps you explain your data processing choices clearly and shows you know how to handle bigger datasets efficiently.

Self-Check

"What if we used a single aggregation function instead of multiple? How would the time complexity change?"