Distance metrics (euclidean, cosine, manhattan) in SciPy - Time & Space Complexity
We want to know how the time to calculate distances grows as we compare more points.
How does the number of points affect the work done by distance calculations?
Analyze the time complexity of the following code snippet.
from scipy.spatial import distance
points = [[1, 2], [3, 4], [5, 6]]
# Calculate Euclidean distance between first two points
euclidean_dist = distance.euclidean(points[0], points[1])
# Calculate Cosine distance between first two points
cosine_dist = distance.cosine(points[0], points[1])
# Calculate Manhattan distance between first two points
manhattan_dist = distance.cityblock(points[0], points[1])
This code calculates three types of distances between two points in 2D space.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Summing differences across each dimension of the points.
- How many times: Once per dimension for each distance calculation.
As the number of dimensions grows, the work to compute distance grows linearly.
| Input Size (dimensions) | Approx. Operations |
|---|---|
| 10 | About 10 sums and calculations |
| 100 | About 100 sums and calculations |
| 1000 | About 1000 sums and calculations |
Pattern observation: Doubling dimensions roughly doubles the work needed.
Time Complexity: O(d)
This means the time to compute distance grows directly with the number of dimensions.
[X] Wrong: "Calculating distance between two points takes the same time no matter how many dimensions they have."
[OK] Correct: Each dimension adds more numbers to process, so more dimensions mean more work.
Understanding how distance calculations scale helps you explain efficiency when working with high-dimensional data.
"What if we calculate distances between many pairs of points instead of just one pair? How would the time complexity change?"