SciPydata~5 mins

Distance metrics (euclidean, cosine, manhattan) in SciPy - Time & Space Complexity

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Time Complexity: Distance metrics (euclidean, cosine, manhattan)

O(d)

Understanding Time Complexity

We want to know how the time to calculate distances grows as we compare more points.

How does the number of points affect the work done by distance calculations?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


from scipy.spatial import distance

points = [[1, 2], [3, 4], [5, 6]]

# Calculate Euclidean distance between first two points
euclidean_dist = distance.euclidean(points[0], points[1])

# Calculate Cosine distance between first two points
cosine_dist = distance.cosine(points[0], points[1])

# Calculate Manhattan distance between first two points
manhattan_dist = distance.cityblock(points[0], points[1])

This code calculates three types of distances between two points in 2D space.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

Primary operation: Summing differences across each dimension of the points.
How many times: Once per dimension for each distance calculation.

How Execution Grows With Input

As the number of dimensions grows, the work to compute distance grows linearly.

Input Size (dimensions)	Approx. Operations
10	About 10 sums and calculations
100	About 100 sums and calculations
1000	About 1000 sums and calculations

Pattern observation: Doubling dimensions roughly doubles the work needed.

Final Time Complexity

Time Complexity: O(d)

This means the time to compute distance grows directly with the number of dimensions.

Common Mistake

[X] Wrong: "Calculating distance between two points takes the same time no matter how many dimensions they have."

[OK] Correct: Each dimension adds more numbers to process, so more dimensions mean more work.

Interview Connect

Understanding how distance calculations scale helps you explain efficiency when working with high-dimensional data.

Self-Check

"What if we calculate distances between many pairs of points instead of just one pair? How would the time complexity change?"