0
0
MlopsHow-ToBeginner · 4 min read

How to Evaluate Clustering Results in Python with sklearn

To evaluate clustering results in Python, use sklearn.metrics functions like silhouette_score for measuring cluster cohesion and separation, or adjusted_rand_score if true labels are known. These metrics help quantify how well your clustering algorithm grouped the data.
📐

Syntax

Common functions to evaluate clustering results in sklearn.metrics include:

  • silhouette_score(X, labels): Measures how similar an object is to its own cluster compared to other clusters. Higher is better.
  • adjusted_rand_score(true_labels, predicted_labels): Compares clustering labels to true labels, adjusted for chance. 1 means perfect match.
  • calinski_harabasz_score(X, labels): Ratio of between-cluster dispersion to within-cluster dispersion. Higher is better.
  • davies_bouldin_score(X, labels): Average similarity between clusters, lower values indicate better clustering.
python
from sklearn.metrics import silhouette_score, adjusted_rand_score, calinski_harabasz_score, davies_bouldin_score

# X: data features
# labels: cluster labels assigned by algorithm
# true_labels: actual labels if available

sil_score = silhouette_score(X, labels)
ari_score = adjusted_rand_score(true_labels, labels)
ch_score = calinski_harabasz_score(X, labels)
db_score = davies_bouldin_score(X, labels)
💻

Example

This example shows how to cluster data with KMeans and evaluate the results using silhouette score and adjusted rand index.

python
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, adjusted_rand_score

# Create sample data with 3 clusters
X, true_labels = make_blobs(n_samples=300, centers=3, cluster_std=0.60, random_state=0)

# Fit KMeans clustering
kmeans = KMeans(n_clusters=3, random_state=0)
predicted_labels = kmeans.fit_predict(X)

# Evaluate clustering
sil_score = silhouette_score(X, predicted_labels)
ari_score = adjusted_rand_score(true_labels, predicted_labels)

print(f"Silhouette Score: {sil_score:.3f}")
print(f"Adjusted Rand Index: {ari_score:.3f}")
Output
Silhouette Score: 0.59 Adjusted Rand Index: 0.90
⚠️

Common Pitfalls

Common mistakes when evaluating clustering results include:

  • Using adjusted_rand_score without true labels, which makes it invalid.
  • Interpreting silhouette score without considering the number of clusters; very high or low cluster counts can distort the score.
  • Ignoring that some metrics like davies_bouldin_score require labels from clustering algorithms, not ground truth.
  • Not scaling or preprocessing data before clustering, which can affect metric results.
python
from sklearn.metrics import adjusted_rand_score

# Wrong: Using ARI without true labels
predicted_labels = [0, 0, 1, 1]
# true_labels missing or unknown
# ari_score = adjusted_rand_score(None, predicted_labels)  # This will error

# Right: Use silhouette score when true labels are unknown
from sklearn.metrics import silhouette_score
X = [[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]
labels = [0, 0, 0, 1, 1, 1]
sil_score = silhouette_score(X, labels)
print(f"Silhouette Score: {sil_score:.3f}")
Output
Silhouette Score: 0.70
📊

Quick Reference

Summary of key clustering evaluation metrics:

MetricPurposeRangeHigher is Better?
silhouette_scoreMeasures cluster cohesion and separation-1 to 1Yes
adjusted_rand_scoreCompares clustering to true labels-1 to 1Yes
calinski_harabasz_scoreRatio of between/within cluster dispersion0 to ∞Yes
davies_bouldin_scoreAverage similarity between clusters0 to ∞No (lower better)

Key Takeaways

Use silhouette score to evaluate clustering quality without true labels.
Use adjusted rand index only if true labels are available for comparison.
Higher silhouette and Calinski-Harabasz scores indicate better clustering.
Lower Davies-Bouldin score means clusters are more distinct.
Always preprocess data and choose metrics suited to your clustering context.