How to Evaluate Clustering Results in Python with sklearn
To evaluate clustering results in Python, use
sklearn.metrics functions like silhouette_score for measuring cluster cohesion and separation, or adjusted_rand_score if true labels are known. These metrics help quantify how well your clustering algorithm grouped the data.Syntax
Common functions to evaluate clustering results in sklearn.metrics include:
silhouette_score(X, labels): Measures how similar an object is to its own cluster compared to other clusters. Higher is better.adjusted_rand_score(true_labels, predicted_labels): Compares clustering labels to true labels, adjusted for chance. 1 means perfect match.calinski_harabasz_score(X, labels): Ratio of between-cluster dispersion to within-cluster dispersion. Higher is better.davies_bouldin_score(X, labels): Average similarity between clusters, lower values indicate better clustering.
python
from sklearn.metrics import silhouette_score, adjusted_rand_score, calinski_harabasz_score, davies_bouldin_score # X: data features # labels: cluster labels assigned by algorithm # true_labels: actual labels if available sil_score = silhouette_score(X, labels) ari_score = adjusted_rand_score(true_labels, labels) ch_score = calinski_harabasz_score(X, labels) db_score = davies_bouldin_score(X, labels)
Example
This example shows how to cluster data with KMeans and evaluate the results using silhouette score and adjusted rand index.
python
from sklearn.datasets import make_blobs from sklearn.cluster import KMeans from sklearn.metrics import silhouette_score, adjusted_rand_score # Create sample data with 3 clusters X, true_labels = make_blobs(n_samples=300, centers=3, cluster_std=0.60, random_state=0) # Fit KMeans clustering kmeans = KMeans(n_clusters=3, random_state=0) predicted_labels = kmeans.fit_predict(X) # Evaluate clustering sil_score = silhouette_score(X, predicted_labels) ari_score = adjusted_rand_score(true_labels, predicted_labels) print(f"Silhouette Score: {sil_score:.3f}") print(f"Adjusted Rand Index: {ari_score:.3f}")
Output
Silhouette Score: 0.59
Adjusted Rand Index: 0.90
Common Pitfalls
Common mistakes when evaluating clustering results include:
- Using
adjusted_rand_scorewithout true labels, which makes it invalid. - Interpreting silhouette score without considering the number of clusters; very high or low cluster counts can distort the score.
- Ignoring that some metrics like
davies_bouldin_scorerequire labels from clustering algorithms, not ground truth. - Not scaling or preprocessing data before clustering, which can affect metric results.
python
from sklearn.metrics import adjusted_rand_score # Wrong: Using ARI without true labels predicted_labels = [0, 0, 1, 1] # true_labels missing or unknown # ari_score = adjusted_rand_score(None, predicted_labels) # This will error # Right: Use silhouette score when true labels are unknown from sklearn.metrics import silhouette_score X = [[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]] labels = [0, 0, 0, 1, 1, 1] sil_score = silhouette_score(X, labels) print(f"Silhouette Score: {sil_score:.3f}")
Output
Silhouette Score: 0.70
Quick Reference
Summary of key clustering evaluation metrics:
| Metric | Purpose | Range | Higher is Better? |
|---|---|---|---|
| silhouette_score | Measures cluster cohesion and separation | -1 to 1 | Yes |
| adjusted_rand_score | Compares clustering to true labels | -1 to 1 | Yes |
| calinski_harabasz_score | Ratio of between/within cluster dispersion | 0 to ∞ | Yes |
| davies_bouldin_score | Average similarity between clusters | 0 to ∞ | No (lower better) |
Key Takeaways
Use silhouette score to evaluate clustering quality without true labels.
Use adjusted rand index only if true labels are available for comparison.
Higher silhouette and Calinski-Harabasz scores indicate better clustering.
Lower Davies-Bouldin score means clusters are more distinct.
Always preprocess data and choose metrics suited to your clustering context.