0
0
SciPydata~5 mins

Distance computation (distance.cdist) in SciPy

Choose your learning style9 modes available
Introduction

We use distance computation to find how far apart points are from each other. It helps us compare data points in many tasks like clustering or searching.

Comparing locations of stores to find the closest ones.
Grouping similar customers based on their shopping habits.
Finding nearest neighbors in recommendation systems.
Measuring similarity between images or text features.
Syntax
SciPy
scipy.spatial.distance.cdist(XA, XB, metric='euclidean', *args, **kwargs)

XA and XB are arrays of points (rows are points, columns are features).

metric defines how distance is measured, default is 'euclidean' (straight line distance).

Examples
Compute Euclidean distance between two sets of 2D points.
SciPy
from scipy.spatial import distance
import numpy as np

XA = np.array([[0, 0], [1, 1]])
XB = np.array([[1, 0], [2, 2]])

result = distance.cdist(XA, XB, metric='euclidean')
print(result)
Compute Manhattan (city block) distance instead of Euclidean.
SciPy
result = distance.cdist(XA, XB, metric='cityblock')
print(result)
Compute cosine distance to measure angle difference between points.
SciPy
result = distance.cdist(XA, XB, metric='cosine')
print(result)
Sample Program

This program calculates the straight-line distances between each point in points_A and each point in points_B. The result is a matrix where each row corresponds to a point in points_A and each column corresponds to a point in points_B.

SciPy
from scipy.spatial import distance
import numpy as np

# Define two sets of points
points_A = np.array([[0, 0], [3, 4]])
points_B = np.array([[1, 1], [6, 8]])

# Compute Euclidean distances between each point in A and each point in B
distances = distance.cdist(points_A, points_B, metric='euclidean')

print('Distance matrix:')
print(distances)
OutputSuccess
Important Notes

The output is a 2D array where element (i, j) is the distance between XA[i] and XB[j].

You can use many distance metrics like 'euclidean', 'cityblock', 'cosine', 'hamming', etc.

Input arrays must be 2D with shape (number_of_points, number_of_features).

Summary

cdist computes distances between two groups of points.

It returns a matrix showing distances from each point in the first group to each point in the second.

You can choose different ways to measure distance using the metric parameter.