Cluster evaluation metrics help us check how good our groups (clusters) are. They tell us if the data points in each group are close and if different groups are well separated.
0
0
Cluster evaluation metrics in SciPy
Introduction
When you want to see if your clustering grouped similar customers together.
To check if your clustering of images puts similar images in the same group.
When comparing different clustering methods to pick the best one.
To measure how well your clustering matches known labels (if available).
When tuning clustering settings to improve group quality.
Syntax
SciPy
from sklearn.metrics import silhouette_score, adjusted_rand_score, davies_bouldin_score # silhouette_score(X, labels) # adjusted_rand_score(true_labels, predicted_labels) # davies_bouldin_score(X, labels)
These functions need your data points (X) and cluster labels.
Some metrics need true labels to compare, others work without them.
Examples
Measures how close points in a cluster are compared to other clusters. Values near 1 mean good clusters.
SciPy
silhouette_score(X, labels)
Compares your clustering to known labels. 1 means perfect match, 0 means random.
SciPy
adjusted_rand_score(true_labels, predicted_labels)
Lower values mean better clusters with less overlap.
SciPy
davies_bouldin_score(X, labels)
Sample Program
This code creates fake data with 3 groups, clusters it, and then checks how good the clustering is using three metrics.
SciPy
from sklearn.datasets import make_blobs from sklearn.cluster import KMeans from sklearn.metrics import silhouette_score, adjusted_rand_score, davies_bouldin_score # Create sample data with 3 clusters X, true_labels = make_blobs(n_samples=300, centers=3, cluster_std=0.60, random_state=0) # Cluster data using KMeans kmeans = KMeans(n_clusters=3, random_state=0) predicted_labels = kmeans.fit_predict(X) # Calculate metrics sil_score = silhouette_score(X, predicted_labels) ari_score = adjusted_rand_score(true_labels, predicted_labels) db_score = davies_bouldin_score(X, predicted_labels) print(f"Silhouette Score: {sil_score:.3f}") print(f"Adjusted Rand Index: {ari_score:.3f}") print(f"Davies-Bouldin Score: {db_score:.3f}")
OutputSuccess
Important Notes
Silhouette score ranges from -1 to 1; closer to 1 is better.
Adjusted Rand Index needs true labels; if unknown, use unsupervised metrics like silhouette.
Davies-Bouldin score is better when smaller.
Summary
Cluster evaluation metrics help measure how well your data is grouped.
Use silhouette score and Davies-Bouldin score when true labels are unknown.
Use Adjusted Rand Index to compare clustering with known labels.