0
0
SciPydata~10 mins

Hierarchical clustering (linkage) in SciPy - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Hierarchical clustering (linkage)
Start with each point as a cluster
Calculate distances between clusters
Find closest clusters
Merge closest clusters
Update distances
More than one cluster?
No
Done
Start with each data point alone, then repeatedly merge the closest clusters until all points form one cluster.
Execution Sample
SciPy
from scipy.cluster.hierarchy import linkage
import numpy as np

points = np.array([[1,2],[2,3],[10,10],[11,11]])
Z = linkage(points, method='single')
This code groups 4 points step-by-step using single linkage hierarchical clustering.
Execution Table
StepClusters MergedDistanceClusters After MergeAction
1[0] and [1]1.414[[0,1]] [2] [3]Merge closest points 0 and 1
2[2] and [3]1.414[[0,1]] [[2,3]]Merge closest points 2 and 3
3[[0,1]] and [[2,3]]12.728[[0,1,2,3]]Merge two clusters into one
4--[[0,1,2,3]]Only one cluster left, stop
💡 All points merged into a single cluster, clustering complete
Variable Tracker
VariableStartAfter 1After 2After 3Final
clusters[0], [1], [2], [3][[0,1]], [2], [3][[0,1]], [[2,3]][[0,1,2,3]][[0,1,2,3]]
distancescalculated between all pointsupdated distances between clustersupdated distances between clustersno distances leftno distances left
Key Moments - 3 Insights
Why do we merge points 0 and 1 first, not 0 and 2?
Because the distance between points 0 and 1 is smaller (1.414) than between 0 and 2 (about 12.04), as shown in step 1 of the execution_table.
What happens to distances after merging clusters?
Distances are recalculated between the new cluster and remaining clusters, as seen after step 1 and 2 in the execution_table where clusters and distances update.
When does the clustering stop?
When only one cluster remains, as shown in step 4 of the execution_table where no more merges happen.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, which clusters merge at step 2?
A[2] and [3]
B[[0,1]] and [[2,3]]
C[0] and [1]
D[1] and [2]
💡 Hint
Check the 'Clusters Merged' column at step 2 in the execution_table.
At which step does the condition 'only one cluster left' become true?
AStep 1
BStep 3
CStep 4
DStep 2
💡 Hint
Look at the 'Action' column in the execution_table for when clustering stops.
If we used 'complete' linkage instead of 'single', how would the distance at step 3 change?
AIt would be the same
BIt would be larger
CIt would be smaller
DIt would be zero
💡 Hint
Complete linkage uses the farthest points distance, so check the 'Distance' column at step 3.
Concept Snapshot
Hierarchical clustering groups points stepwise.
Start: each point is its own cluster.
Find closest clusters by distance.
Merge them and update distances.
Repeat until one cluster remains.
Linkage method (single, complete) affects distance calculation.
Full Transcript
Hierarchical clustering starts with each data point as its own cluster. We calculate distances between all clusters and merge the closest two. After merging, distances are updated to reflect the new clusters. This repeats until all points form one cluster. The linkage method defines how distances between clusters are measured. In this example, single linkage merges clusters based on the shortest distance between points. The execution table shows each merge step, clusters involved, and distances. The process stops when only one cluster remains.