0
0
SciPydata~10 mins

Distance computation (distance.cdist) in SciPy - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Distance computation (distance.cdist)
Input: Two sets of points A and B
Select distance metric (e.g., Euclidean)
Compute pairwise distances between each point in A and each point in B
Output: Distance matrix with shape (len(A), len(B))
We start with two groups of points, pick a distance type, then calculate distances between every pair from the two groups, producing a matrix of distances.
Execution Sample
SciPy
import numpy as np
from scipy.spatial import distance

A = np.array([[0, 0], [1, 1]])
B = np.array([[1, 0], [2, 2]])

D = distance.cdist(A, B, 'euclidean')
print(D)
This code calculates Euclidean distances between points in A and B, outputting a matrix of distances.
Execution Table
StepActionPoints from APoints from BDistance ComputedDistance Matrix State
1Start with points A and B[[0,0],[1,1]][[1,0],[2,2]]-Empty matrix (2x2)
2Compute distance between A[0] and B[0][0,0][1,0]sqrt((1-0)^2+(0-0)^2)=1.0[[1.0, ?], [?, ?]]
3Compute distance between A[0] and B[1][0,0][2,2]sqrt((2-0)^2+(2-0)^2)=2.8284[[1.0, 2.8284], [?, ?]]
4Compute distance between A[1] and B[0][1,1][1,0]sqrt((1-1)^2+(0-1)^2)=1.0[[1.0, 2.8284], [1.0, ?]]
5Compute distance between A[1] and B[1][1,1][2,2]sqrt((2-1)^2+(2-1)^2)=1.4142[[1.0, 2.8284], [1.0, 1.4142]]
6All pairs computed---Final distance matrix completed
💡 All pairs of points from A and B have been processed, distance matrix fully computed.
Variable Tracker
VariableStartAfter Step 2After Step 3After Step 4After Step 5Final
Dempty 2x2 matrix[[1.0, ?], [?, ?]][[1.0, 2.8284], [?, ?]][[1.0, 2.8284], [1.0, ?]][[1.0, 2.8284], [1.0, 1.4142]][[1.0, 2.8284], [1.0, 1.4142]]
Key Moments - 3 Insights
Why does the output matrix have shape (2, 2) instead of (4, 4)?
Because the distance matrix rows correspond to points in A (2 points) and columns to points in B (2 points), so shape is (len(A), len(B)) as shown in execution_table rows 2-5.
How is the Euclidean distance calculated between two points?
It uses the formula sqrt((x2 - x1)^2 + (y2 - y1)^2), as shown in the 'Distance Computed' column in execution_table rows 2-5.
What happens if we change the metric from 'euclidean' to 'cityblock'?
The distances would be computed as sum of absolute differences instead of Euclidean, changing the values in the distance matrix but keeping the same shape and process.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the distance between A[1] and B[1] at step 5?
A1.0
B1.4142
C2.8284
D0.0
💡 Hint
Check the 'Distance Computed' column at step 5 in the execution_table.
At which step does the distance matrix become fully filled?
AStep 4
BStep 6
CStep 5
DStep 3
💡 Hint
Look at the 'Distance Matrix State' column; the matrix becomes fully filled (no ? left) at step 5.
If we add one more point to B, how would the distance matrix shape change?
AIt would have one more column
BIt would have one more row
CIt would stay the same size
DIt would become a 3D matrix
💡 Hint
Distance matrix shape is (len(A), len(B)) as explained in key_moments and concept_flow.
Concept Snapshot
distance.cdist(A, B, metric)
- Computes pairwise distances between points in A and B
- Returns matrix shape (len(A), len(B))
- metric='euclidean' is default for straight-line distance
- Supports many metrics like 'cityblock', 'cosine'
- Useful for comparing sets of points quickly
Full Transcript
This visual execution shows how scipy.spatial.distance.cdist computes distances between two sets of points. We start with two arrays A and B, each holding points in 2D space. The function calculates the distance between every point in A and every point in B using the chosen metric, here Euclidean distance. The output is a matrix where each row corresponds to a point in A and each column to a point in B. The matrix entries are the distances. Step by step, we see each pair's distance computed and placed in the matrix until all pairs are done. This helps understand how cdist works internally and what the output means.