Broadcasting with higher dimensions in NumPy - Time & Space Complexity
We want to understand how the time needed to run numpy operations changes when using broadcasting with arrays of different sizes and dimensions.
Specifically, how does adding more dimensions affect the work numpy does?
Analyze the time complexity of the following code snippet.
import numpy as np
A = np.ones((100, 1, 50))
B = np.ones((1, 200, 50))
C = A + B
This code adds two arrays with different shapes by broadcasting them to a common shape before element-wise addition.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Element-wise addition over the broadcasted array.
- How many times: Once for each element in the final broadcasted shape (100 x 200 x 50).
As the sizes of the arrays grow, the number of additions grows with the total number of elements after broadcasting.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 (e.g., 10x10x10) | 1,000 |
| 100 (e.g., 100x100x100) | 1,000,000 |
| 1000 (e.g., 1000x1000x1000) | 1,000,000,000 |
Pattern observation: The operations grow roughly with the product of the dimensions, so increasing dimensions or their sizes multiplies the work.
Time Complexity: O(n * m * p)
This means the time grows proportionally to the total number of elements after broadcasting, which is the product of all dimensions.
[X] Wrong: "Broadcasting makes the operation as fast as the smaller array size because it just repeats values."
[OK] Correct: Even though broadcasting avoids copying data, numpy still performs the operation on every element in the broadcasted shape, so the time depends on the full size after broadcasting.
Understanding how broadcasting affects time helps you explain performance when working with multi-dimensional data, a common task in data science and machine learning.
"What if we changed one array to have a dimension of size 1 replaced by a larger size? How would the time complexity change?"