What is Cluster evaluation metrics in ML Python?

ML Pythonml~5 mins

Cluster evaluation metrics in ML Python

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

Cluster evaluation metrics help us check how well our groups (clusters) of data points are formed. They tell us if the clusters are tight and well separated.

When you want to see if your clustering grouped similar items together well.

When comparing different clustering methods to pick the best one.

When you want to check if the number of clusters you chose makes sense.

When you want to measure how close points in the same cluster are.

When you want to measure how far apart different clusters are.

Syntax

ML Python

metric_name(true_labels, predicted_labels)

# Example metrics:
# - adjusted_rand_score(true_labels, predicted_labels)
# - silhouette_score(data, predicted_labels)
# - calinski_harabasz_score(data, predicted_labels)
# - davies_bouldin_score(data, predicted_labels)

Some metrics need the original data points and cluster labels, others only need the labels.

True labels are needed only if you have ground truth to compare against.

Examples

Adjusted Rand Index compares predicted clusters to true groups. Score near 1 means good match.

ML Python

from sklearn.metrics import adjusted_rand_score

true_labels = [0, 0, 1, 1, 2, 2]
predicted_labels = [0, 0, 1, 2, 2, 2]
score = adjusted_rand_score(true_labels, predicted_labels)
print(score)

Silhouette score measures how close points are in the same cluster compared to other clusters. Score near 1 is good.

ML Python

from sklearn.metrics import silhouette_score

import numpy as np

data = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
predicted_labels = [0, 0, 0, 1, 1, 1]
score = silhouette_score(data, predicted_labels)
print(score)

Sample Model

This code clusters simple 2D points into 2 groups using KMeans. Then it calculates three common cluster evaluation scores to check cluster quality.

ML Python

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, calinski_harabasz_score, davies_bouldin_score
import numpy as np

# Sample data: 2 groups of points
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])

# Create KMeans clustering with 2 clusters
kmeans = KMeans(n_clusters=2, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_

# Calculate cluster evaluation metrics
sil_score = silhouette_score(X, labels)
calinski_score = calinski_harabasz_score(X, labels)
davies_score = davies_bouldin_score(X, labels)

print(f"Silhouette Score: {sil_score:.3f}")
print(f"Calinski-Harabasz Score: {calinski_score:.3f}")
print(f"Davies-Bouldin Score: {davies_score:.3f}")

OutputSuccess

Important Notes

Silhouette score ranges from -1 to 1; higher is better.

Calinski-Harabasz score is higher when clusters are dense and well separated.

Davies-Bouldin score is lower when clusters are better separated.

Summary

Cluster evaluation metrics help measure how good your clusters are.

Some metrics need true labels, others only need data and cluster labels.

Use multiple metrics to get a full picture of cluster quality.

Practice

(1/5)

1. Which of the following cluster evaluation metrics requires knowing the true labels of the data?

easy

A. Davies-Bouldin Index

B. Silhouette Score

C. Adjusted Rand Index (ARI)

D. Calinski-Harabasz Index

Cluster evaluation metrics in ML Python

Start learning this pattern below

Practice

Solution

Step 1: Understand metric types

Step 2: Identify ARI as external metric

Final Answer:

Quick Check:

Solution

Step 1: Check import source

Step 2: Check function parameters

Final Answer:

Quick Check:

Solution

Step 1: Understand Davies-Bouldin Index meaning

Step 2: Calculate score using sklearn

Final Answer:

Quick Check:

Solution

Step 1: Check input lengths

Step 2: Understand silhouette_score input requirements

Final Answer:

Quick Check:

Solution

Step 1: Identify metrics that do not require true labels

Step 2: Understand other metrics need true labels

Final Answer:

Quick Check: