Why aggregation matters in NumPy - Performance Analysis
When working with data, aggregation helps us summarize many values into one. Understanding how long aggregation takes is important for handling big data efficiently.
We want to know how the time to aggregate grows as the data size grows.
Analyze the time complexity of the following code snippet.
import numpy as np
arr = np.random.rand(1000000)
result = np.sum(arr)
print(result)
This code creates a large array and sums all its values into one number.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Adding each number in the array one by one.
- How many times: Once for every element in the array.
As the array gets bigger, the time to add all numbers grows in a straight line.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 additions |
| 100 | 100 additions |
| 1000 | 1000 additions |
Pattern observation: Doubling the input doubles the work needed.
Time Complexity: O(n)
This means the time to sum grows directly with the number of elements.
[X] Wrong: "Summing many numbers is very fast and does not depend on how many numbers there are."
[OK] Correct: Even though addition is simple, you must add each number once, so more numbers mean more work.
Knowing how aggregation time grows helps you explain how your code handles big data smoothly and why some operations take longer as data grows.
"What if we used np.mean() instead of np.sum()? How would the time complexity change?"