0
0
SciPydata~5 mins

Distance computation (distance.cdist) in SciPy - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Distance computation (distance.cdist)
O(m x n)
Understanding Time Complexity

When we calculate distances between two groups of points, it takes time depending on how many points there are.

We want to know how the time needed grows as the number of points grows.

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

from scipy.spatial import distance
import numpy as np

pointsA = np.random.rand(100, 3)  # 100 points in 3D
pointsB = np.random.rand(200, 3)  # 200 points in 3D

result = distance.cdist(pointsA, pointsB, 'euclidean')

This code computes the Euclidean distance between every point in pointsA and every point in pointsB.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Calculating distance between each pair of points from two sets.
  • How many times: For each of the 100 points in pointsA, distances to all 200 points in pointsB are computed, so 100 x 200 = 20,000 times.
How Execution Grows With Input

As the number of points grows, the total distance calculations grow by multiplying the sizes of both sets.

Input Size (pointsA x pointsB)Approx. Operations
10 x 10100
100 x 10010,000
1000 x 10001,000,000

Pattern observation: Doubling the number of points in both sets roughly squares the number of distance calculations.

Final Time Complexity

Time Complexity: O(m × n)

This means the time grows proportionally to the product of the number of points in each set.

Common Mistake

[X] Wrong: "The time depends only on the total number of points combined, like m + n."

[OK] Correct: Each point in one set is compared to every point in the other set, so the total work multiplies, not adds.

Interview Connect

Understanding how distance calculations scale helps you explain performance in tasks like clustering or nearest neighbor search, showing you grasp practical data science challenges.

Self-Check

"What if we only compute distances between points in the same set (like pointsA to pointsA)? How would the time complexity change?"