0
0
MlopsConceptBeginner · 3 min read

Agglomerative Clustering in Python with sklearn: What It Is and How to Use

Agglomerative clustering in Python is a method to group data points by merging the closest pairs step-by-step using sklearn.cluster.AgglomerativeClustering. It builds clusters from the bottom up, joining small groups into bigger ones until the desired number of clusters is reached.
⚙️

How It Works

Agglomerative clustering is like making a family tree but for data points. Imagine you have many dots on a paper, and you want to group them by closeness. First, each dot is its own group. Then, you find the two closest groups and join them together. You keep doing this step-by-step, joining the nearest groups, until you have just a few big groups left.

This process is called "bottom-up" because you start with many small groups and build up to bigger ones. The closeness between groups can be measured in different ways, like the shortest distance between any two points in the groups or the average distance. This method helps find natural clusters in data without needing to guess their shape.

💻

Example

This example shows how to use AgglomerativeClustering from sklearn to group simple 2D points into clusters.

python
from sklearn.cluster import AgglomerativeClustering
import numpy as np

# Sample data: 6 points in 2D space
X = np.array([[1, 2], [1, 4], [1, 0],
              [10, 2], [10, 4], [10, 0]])

# Create the clustering model to find 2 clusters
model = AgglomerativeClustering(n_clusters=2)

# Fit the model and get cluster labels
labels = model.fit_predict(X)

print(labels)
Output
[0 0 0 1 1 1]
🎯

When to Use

Agglomerative clustering is useful when you want to find groups in data without knowing their exact shape or size. It works well for small to medium datasets where you want a clear hierarchy of clusters.

Real-world uses include grouping similar documents, customer segmentation in marketing, or organizing images by similarity. It is especially helpful when you want to understand how clusters form step-by-step, as it creates a tree-like structure called a dendrogram (though sklearn's basic class does not plot it directly).

Key Points

  • Agglomerative clustering merges closest groups step-by-step from many small clusters to fewer big ones.
  • It uses distance measures to decide which clusters to join.
  • Implemented in Python with sklearn.cluster.AgglomerativeClustering.
  • Good for hierarchical grouping and small to medium datasets.
  • Produces cluster labels that assign each data point to a cluster.

Key Takeaways

Agglomerative clustering groups data by merging closest clusters step-by-step from bottom up.
Use sklearn's AgglomerativeClustering to easily apply this method in Python.
It is ideal for discovering natural groupings without assuming cluster shapes.
Best suited for small to medium datasets where hierarchical structure matters.
Outputs cluster labels assigning each data point to a cluster.