0
0
SciPydata~5 mins

Hierarchical clustering (linkage) in SciPy

Choose your learning style9 modes available
Introduction

Hierarchical clustering groups similar data points step-by-step. It helps find natural groups without knowing how many groups there are.

You want to organize customers into groups based on buying habits.
You need to find clusters in data without deciding the number of clusters first.
You want to visualize how data points group together using a tree diagram.
You have small to medium datasets and want to explore data structure.
You want to compare different ways of measuring distance between groups.
Syntax
SciPy
from scipy.cluster.hierarchy import linkage
Z = linkage(data, method='single', metric='euclidean')

data is your input data as a 2D array or matrix.

method chooses how to link clusters: 'single', 'complete', 'average', etc.

Examples
Uses the shortest distance between clusters to link them.
SciPy
Z = linkage(data, method='single')
Uses the longest distance between clusters to link them.
SciPy
Z = linkage(data, method='complete')
Uses the average distance between all points in clusters.
SciPy
Z = linkage(data, method='average')
Minimizes variance within clusters, good for compact clusters.
SciPy
Z = linkage(data, method='ward')
Sample Program

This code clusters 5 points using average linkage. It prints the linkage matrix showing how clusters merge step-by-step. Then it draws a dendrogram to visualize the cluster hierarchy.

SciPy
import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt

# Sample data: 5 points with 2 features each
data = np.array([[1, 2], [2, 3], [5, 8], [6, 8], [7, 9]])

# Perform hierarchical clustering using average linkage
Z = linkage(data, method='average')

# Print linkage matrix
print(Z)

# Plot dendrogram to visualize clustering
plt.figure(figsize=(6, 4))
dendrogram(Z, labels=["A", "B", "C", "D", "E"])
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Sample')
plt.ylabel('Distance')
plt.tight_layout()
plt.show()
OutputSuccess
Important Notes

The linkage matrix has 4 columns: indices of clusters merged, distance between them, and number of original points in the new cluster.

Use dendrograms to understand cluster merging visually.

Different linkage methods can produce different cluster shapes.

Summary

Hierarchical clustering groups data step-by-step without preset cluster count.

Linkage methods control how distances between clusters are calculated.

Dendrograms help visualize the cluster structure and merging process.