Distance matrix computation in SciPy - Time & Space Complexity
When we compute a distance matrix, we find distances between many pairs of points.
We want to know how the time needed grows as we add more points.
Analyze the time complexity of the following code snippet.
import numpy as np
from scipy.spatial import distance_matrix
n = 100 # example number of points
points = np.random.rand(n, 2) # n points in 2D
D = distance_matrix(points, points)
This code creates n random points and computes the full distance matrix between them.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Calculating distance between each pair of points.
- How many times: For n points, distances are computed for all n x n pairs.
As the number of points grows, the number of distance calculations grows quickly.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 100 |
| 100 | 10,000 |
| 1000 | 1,000,000 |
Pattern observation: The operations grow roughly by the square of n, so doubling points makes calculations about four times more.
Time Complexity: O(n²)
This means the time needed grows roughly with the square of the number of points.
[X] Wrong: "Computing distances grows linearly with the number of points."
[OK] Correct: Each point's distance is calculated to every other point, so the total pairs grow much faster than just the number of points.
Understanding how distance matrix computation scales helps you explain performance in data tasks involving many points.
"What if we only compute distances between points in two different sets instead of one set to itself? How would the time complexity change?"