SciPydata~10 mins

Cluster evaluation metrics in SciPy - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Cluster evaluation metrics

Start with true labels and predicted clusters

↓

Choose evaluation metric

↓

Calculate metric value

↓

Interpret metric: higher or lower is better

↓

Use metric to compare clustering quality

↓

End

We start with true and predicted cluster labels, pick a metric, calculate it, then interpret the result to judge clustering quality.

Execution Sample

SciPy

from sklearn.metrics import adjusted_rand_score
true = [0, 0, 1, 1, 2, 2]
pred = [0, 0, 2, 1, 2, 2]
score = adjusted_rand_score(true, pred)
print(score)

Calculate Adjusted Rand Index to compare true and predicted cluster labels.

Execution Table

Step	Action	Input	Intermediate Result	Output
1	Input true labels	[0, 0, 1, 1, 2, 2]	-	Stored true labels
2	Input predicted labels	[0, 0, 2, 1, 2, 2]	-	Stored predicted labels
3	Calculate contingency matrix	true & pred	[[2, 0, 0], [0, 1, 1], [0, 0, 2]]	Contingency matrix computed
4	Compute index components	contingency matrix	Sum combinations for pairs	Pairs counted
5	Calculate Adjusted Rand Index	index components	Adjusted for chance	Score = 0.5757575757575757
6	Print score	score	-	0.5757575757575757
7	End	-	-	Execution complete

💡 All steps completed, Adjusted Rand Index calculated and printed

Variable Tracker

Variable	Start	After Step 1	After Step 2	After Step 3	After Step 5	Final
true	None	[0, 0, 1, 1, 2, 2]	[0, 0, 1, 1, 2, 2]	[0, 0, 1, 1, 2, 2]	[0, 0, 1, 1, 2, 2]	[0, 0, 1, 1, 2, 2]
pred	None	None	[0, 0, 2, 1, 2, 2]	[0, 0, 2, 1, 2, 2]	[0, 0, 2, 1, 2, 2]	[0, 0, 2, 1, 2, 2]
contingency_matrix	None	None	None	[[2, 0, 0], [0, 1, 1], [0, 0, 2]]	[[2, 0, 0], [0, 1, 1], [0, 0, 2]]	[[2, 0, 0], [0, 1, 1], [0, 0, 2]]
score	None	None	None	None	0.5757575757575757	0.5757575757575757

Key Moments - 3 Insights

Why is the Adjusted Rand Index not 1 even though some clusters match?

What does the contingency matrix represent in clustering evaluation?

Why do we need both true and predicted labels as inputs?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table at step 5, what is the Adjusted Rand Index score calculated?

A0.0

B1.0

C0.5757575757575757

D0.85

Concept Snapshot

Cluster evaluation metrics compare true and predicted cluster labels.
Common metrics: Adjusted Rand Index, Silhouette Score, Homogeneity.
Input: true labels and predicted labels.
Output: score indicating clustering quality.
Higher score usually means better clustering.
Use metrics to choose or tune clustering methods.

Full Transcript

Cluster evaluation metrics help us measure how well our clustering matches the true groups. We start with two lists: true labels and predicted cluster labels. We pick a metric like Adjusted Rand Index, which compares these labels and adjusts for chance. The code calculates a contingency matrix counting overlaps between true and predicted clusters. Then it computes the score, which ranges from -1 to 1, where 1 means perfect match. The example shows a score of about 0.58, meaning partial agreement. This process helps us judge clustering quality and improve models.