Aggregation along specific axes in NumPy - Time & Space Complexity
When we use numpy to add or average numbers along rows or columns, it takes some time depending on the data size.
We want to know how this time grows as the data gets bigger.
Analyze the time complexity of the following code snippet.
import numpy as np
arr = np.random.rand(1000, 500)
sum_rows = np.sum(arr, axis=1)
sum_cols = np.sum(arr, axis=0)
mean_rows = np.mean(arr, axis=1)
mean_cols = np.mean(arr, axis=0)
This code creates a 2D array and calculates sums and averages along rows and columns.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Adding or averaging all elements along one axis (rows or columns).
- How many times: For each row or column, numpy visits every element once.
When the array size grows, the number of elements to add or average grows too.
| Input Size (rows x cols) | Approx. Operations |
|---|---|
| 10 x 10 | 100 additions |
| 100 x 100 | 10,000 additions |
| 1000 x 1000 | 1,000,000 additions |
Pattern observation: The operations grow roughly with the total number of elements in the array.
Time Complexity: O(n * m)
This means the time grows proportionally to the total number of elements in the array.
[X] Wrong: "Summing along one axis only takes time proportional to the number of rows or columns, not the whole array."
[OK] Correct: Even when summing along one axis, numpy must visit every element in that axis, so it still processes all elements in the array.
Understanding how numpy processes data helps you explain performance in real tasks like data analysis or machine learning.
"What if we used numpy's sum on a 3D array along one axis? How would the time complexity change?"