MlopsHow-ToBeginner · 4 min read

How to Evaluate Clustering Results in Python with sklearn

To evaluate clustering results in Python, use sklearn.metrics functions like silhouette_score for measuring cluster cohesion and separation, or adjusted_rand_score if true labels are known. These metrics help quantify how well your clustering algorithm grouped the data.

📐

Syntax

Common functions to evaluate clustering results in sklearn.metrics include:

silhouette_score(X, labels): Measures how similar an object is to its own cluster compared to other clusters. Higher is better.
adjusted_rand_score(true_labels, predicted_labels): Compares clustering labels to true labels, adjusted for chance. 1 means perfect match.
calinski_harabasz_score(X, labels): Ratio of between-cluster dispersion to within-cluster dispersion. Higher is better.
davies_bouldin_score(X, labels): Average similarity between clusters, lower values indicate better clustering.

python

from sklearn.metrics import silhouette_score, adjusted_rand_score, calinski_harabasz_score, davies_bouldin_score

# X: data features
# labels: cluster labels assigned by algorithm
# true_labels: actual labels if available

sil_score = silhouette_score(X, labels)
ari_score = adjusted_rand_score(true_labels, labels)
ch_score = calinski_harabasz_score(X, labels)
db_score = davies_bouldin_score(X, labels)

💻

Example

This example shows how to cluster data with KMeans and evaluate the results using silhouette score and adjusted rand index.

python

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, adjusted_rand_score

# Create sample data with 3 clusters
X, true_labels = make_blobs(n_samples=300, centers=3, cluster_std=0.60, random_state=0)

# Fit KMeans clustering
kmeans = KMeans(n_clusters=3, random_state=0)
predicted_labels = kmeans.fit_predict(X)

# Evaluate clustering
sil_score = silhouette_score(X, predicted_labels)
ari_score = adjusted_rand_score(true_labels, predicted_labels)

print(f"Silhouette Score: {sil_score:.3f}")
print(f"Adjusted Rand Index: {ari_score:.3f}")

Output

Silhouette Score: 0.59 Adjusted Rand Index: 0.90

⚠️

Common Pitfalls

Common mistakes when evaluating clustering results include:

Using adjusted_rand_score without true labels, which makes it invalid.
Interpreting silhouette score without considering the number of clusters; very high or low cluster counts can distort the score.
Ignoring that some metrics like davies_bouldin_score require labels from clustering algorithms, not ground truth.
Not scaling or preprocessing data before clustering, which can affect metric results.

python

from sklearn.metrics import adjusted_rand_score

# Wrong: Using ARI without true labels
predicted_labels = [0, 0, 1, 1]
# true_labels missing or unknown
# ari_score = adjusted_rand_score(None, predicted_labels)  # This will error

# Right: Use silhouette score when true labels are unknown
from sklearn.metrics import silhouette_score
X = [[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]
labels = [0, 0, 0, 1, 1, 1]
sil_score = silhouette_score(X, labels)
print(f"Silhouette Score: {sil_score:.3f}")

Output

Silhouette Score: 0.70

📊

Quick Reference

Summary of key clustering evaluation metrics:

Metric	Purpose	Range	Higher is Better?
silhouette_score	Measures cluster cohesion and separation	-1 to 1	Yes
adjusted_rand_score	Compares clustering to true labels	-1 to 1	Yes
calinski_harabasz_score	Ratio of between/within cluster dispersion	0 to ∞	Yes
davies_bouldin_score	Average similarity between clusters	0 to ∞	No (lower better)

✅

Key Takeaways

Use silhouette score to evaluate clustering quality without true labels.

Use adjusted rand index only if true labels are available for comparison.

Higher silhouette and Calinski-Harabasz scores indicate better clustering.

Lower Davies-Bouldin score means clusters are more distinct.

Always preprocess data and choose metrics suited to your clustering context.