0
0
ML Pythonprogramming~5 mins

Hierarchical clustering in ML Python

Choose your learning style9 modes available
Introduction
Hierarchical clustering groups similar data points step-by-step to show how they relate in a tree-like structure.
When you want to see how data naturally groups without deciding the number of groups first.
When you need a visual tree (dendrogram) to understand data relationships.
When working with small to medium datasets to explore data structure.
When you want to cluster data but don't know how many clusters to expect.
When you want to combine or split clusters at different levels of detail.
Syntax
ML Python
from sklearn.cluster import AgglomerativeClustering

model = AgglomerativeClustering(n_clusters=number, linkage='ward')
model.fit(data)
labels = model.labels_
n_clusters sets how many groups you want at the end.
linkage defines how to measure distance between clusters; common options are 'ward', 'complete', and 'average'.
Examples
Groups data into 3 clusters using Ward's method, which minimizes variance inside clusters.
ML Python
model = AgglomerativeClustering(n_clusters=3, linkage='ward')
Groups data into 2 clusters using complete linkage, which considers the farthest points between clusters.
ML Python
model = AgglomerativeClustering(n_clusters=2, linkage='complete')
Stops clustering when clusters are farther than 1.5 units apart, letting the algorithm decide the number of clusters.
ML Python
model = AgglomerativeClustering(n_clusters=None, distance_threshold=1.5)
Sample Program
This program creates simple data with 3 groups, applies hierarchical clustering to find 3 clusters, and prints the cluster each point belongs to.
ML Python
from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets import make_blobs

# Create sample data with 3 centers
X, _ = make_blobs(n_samples=15, centers=3, random_state=42)

# Create and fit hierarchical clustering model
model = AgglomerativeClustering(n_clusters=3, linkage='ward')
model.fit(X)

# Print cluster labels for each point
print('Cluster labels:', model.labels_)
OutputSuccess
Important Notes
Hierarchical clustering can be slow on very large datasets because it compares all points step-by-step.
The dendrogram is a helpful visual to decide the number of clusters by cutting the tree at different heights.
Choosing the right linkage method affects how clusters form; try different ones to see what fits your data best.
Summary
Hierarchical clustering builds a tree of clusters by joining or splitting data points stepwise.
It does not require you to pick the number of clusters upfront if you use distance thresholds.
Useful for exploring data structure and visualizing relationships with dendrograms.