Distance computation (distance.cdist) in SciPy - Time & Space Complexity
When we calculate distances between two groups of points, it takes time depending on how many points there are.
We want to know how the time needed grows as the number of points grows.
Analyze the time complexity of the following code snippet.
from scipy.spatial import distance
import numpy as np
pointsA = np.random.rand(100, 3) # 100 points in 3D
pointsB = np.random.rand(200, 3) # 200 points in 3D
result = distance.cdist(pointsA, pointsB, 'euclidean')
This code computes the Euclidean distance between every point in pointsA and every point in pointsB.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Calculating distance between each pair of points from two sets.
- How many times: For each of the 100 points in pointsA, distances to all 200 points in pointsB are computed, so 100 x 200 = 20,000 times.
As the number of points grows, the total distance calculations grow by multiplying the sizes of both sets.
| Input Size (pointsA x pointsB) | Approx. Operations |
|---|---|
| 10 x 10 | 100 |
| 100 x 100 | 10,000 |
| 1000 x 1000 | 1,000,000 |
Pattern observation: Doubling the number of points in both sets roughly squares the number of distance calculations.
Time Complexity: O(m × n)
This means the time grows proportionally to the product of the number of points in each set.
[X] Wrong: "The time depends only on the total number of points combined, like m + n."
[OK] Correct: Each point in one set is compared to every point in the other set, so the total work multiplies, not adds.
Understanding how distance calculations scale helps you explain performance in tasks like clustering or nearest neighbor search, showing you grasp practical data science challenges.
"What if we only compute distances between points in the same set (like pointsA to pointsA)? How would the time complexity change?"