Broadcasting for distance matrices in NumPy - Time & Space Complexity
We want to understand how the time needed to compute distance matrices grows as the number of points increases.
Specifically, how does using broadcasting in numpy affect the work done?
Analyze the time complexity of the following code snippet.
import numpy as np
points = np.random.rand(n, 2) # n points in 2D
diffs = points[:, np.newaxis, :] - points[np.newaxis, :, :]
dist_matrix = np.sqrt(np.sum(diffs ** 2, axis=-1))
This code calculates the distance matrix between n points using numpy broadcasting.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Computing differences for every pair of points (n x n pairs).
- How many times: The subtraction and sum happen for each pair, so about n² times.
As the number of points n grows, the number of pairs grows much faster.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 100 (10x10) |
| 100 | 10,000 (100x100) |
| 1000 | 1,000,000 (1000x1000) |
Pattern observation: The work grows by the square of n, so doubling n makes the work about four times bigger.
Time Complexity: O(n²)
This means the time needed grows roughly with the square of the number of points.
[X] Wrong: "Broadcasting makes this calculation run in linear time because it avoids loops."
[OK] Correct: Broadcasting helps write code without explicit loops, but the actual number of operations still depends on all pairs, so it grows with n squared.
Understanding how broadcasting affects time helps you explain performance clearly and shows you know how numpy handles big data.
"What if we only computed distances for pairs where the first point index is less than the second? How would the time complexity change?"