Hierarchical clustering groups similar data points step-by-step. It helps find natural groups without knowing how many groups there are.
Hierarchical clustering (linkage) in SciPy
from scipy.cluster.hierarchy import linkage Z = linkage(data, method='single', metric='euclidean')
data is your input data as a 2D array or matrix.
method chooses how to link clusters: 'single', 'complete', 'average', etc.
Z = linkage(data, method='single')Z = linkage(data, method='complete')Z = linkage(data, method='average')Z = linkage(data, method='ward')This code clusters 5 points using average linkage. It prints the linkage matrix showing how clusters merge step-by-step. Then it draws a dendrogram to visualize the cluster hierarchy.
import numpy as np from scipy.cluster.hierarchy import linkage, dendrogram import matplotlib.pyplot as plt # Sample data: 5 points with 2 features each data = np.array([[1, 2], [2, 3], [5, 8], [6, 8], [7, 9]]) # Perform hierarchical clustering using average linkage Z = linkage(data, method='average') # Print linkage matrix print(Z) # Plot dendrogram to visualize clustering plt.figure(figsize=(6, 4)) dendrogram(Z, labels=["A", "B", "C", "D", "E"]) plt.title('Hierarchical Clustering Dendrogram') plt.xlabel('Sample') plt.ylabel('Distance') plt.tight_layout() plt.show()
The linkage matrix has 4 columns: indices of clusters merged, distance between them, and number of original points in the new cluster.
Use dendrograms to understand cluster merging visually.
Different linkage methods can produce different cluster shapes.
Hierarchical clustering groups data step-by-step without preset cluster count.
Linkage methods control how distances between clusters are calculated.
Dendrograms help visualize the cluster structure and merging process.