Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is the purpose of cluster evaluation metrics?
Cluster evaluation metrics help us measure how well a clustering algorithm groups data points. They tell us if clusters are tight and well-separated.
Click to reveal answer
intermediate
Explain the Silhouette Score in clustering.
The Silhouette Score measures how similar a point is to its own cluster compared to other clusters. Scores near +1 mean good clustering, near 0 means overlapping clusters, and negative means wrong cluster assignment.
Click to reveal answer
intermediate
What does the Davies-Bouldin Index indicate?
The Davies-Bouldin Index measures average similarity between clusters. Lower values mean clusters are compact and far apart, which is better.
Click to reveal answer
intermediate
Describe the difference between internal and external cluster evaluation metrics.
Internal metrics use only the data and cluster labels to evaluate quality (e.g., Silhouette Score). External metrics compare clusters to known true labels (e.g., Adjusted Rand Index).
Click to reveal answer
intermediate
What is the Adjusted Rand Index (ARI) used for?
ARI measures similarity between the clustering result and true labels, adjusting for chance. It ranges from -1 to 1, where 1 means perfect match.
Click to reveal answer
Which cluster evaluation metric ranges from -1 to 1 and measures similarity to true labels?
ASilhouette Score
BAdjusted Rand Index
CDavies-Bouldin Index
DCalinski-Harabasz Index
✗ Incorrect
The Adjusted Rand Index compares clustering results to true labels and ranges from -1 to 1.
A high Silhouette Score indicates:
AClusters overlap heavily
BClusters are poorly formed
CClusters are well separated and compact
DClusters have many outliers
✗ Incorrect
High Silhouette Scores mean points are closer to their own cluster than others, showing good clustering.
Which metric should be minimized for better clustering?
ADavies-Bouldin Index
BAdjusted Rand Index
CSilhouette Score
DHomogeneity Score
✗ Incorrect
Lower Davies-Bouldin Index values indicate better cluster separation and compactness.
Internal cluster evaluation metrics require:
AHuman expert input
BTrue labels of data
CExternal validation data
DOnly data and cluster assignments
✗ Incorrect
Internal metrics evaluate clusters using only the data and the clusters found, without true labels.
Which metric compares clustering results to known labels adjusting for chance grouping?
AAdjusted Rand Index
BCalinski-Harabasz Index
CDavies-Bouldin Index
DSilhouette Score
✗ Incorrect
Adjusted Rand Index adjusts for chance and compares clustering to true labels.
Explain how the Silhouette Score helps evaluate clustering quality.
Think about how close points are to their own cluster versus others.
You got /3 concepts.
Describe the difference between internal and external cluster evaluation metrics with examples.
Consider whether true labels are needed.
You got /3 concepts.
Practice
(1/5)
1. Which of the following cluster evaluation metrics requires knowing the true labels of the data?
easy
A. Davies-Bouldin Index
B. Silhouette Score
C. Adjusted Rand Index (ARI)
D. Calinski-Harabasz Index
Solution
Step 1: Understand metric types
Some cluster metrics need true labels (external metrics), others only use cluster assignments (internal metrics).
Step 2: Identify ARI as external metric
Adjusted Rand Index compares predicted clusters to true labels, so it requires true labels.
Final Answer:
Adjusted Rand Index (ARI) -> Option C
Quick Check:
External metric = ARI [OK]
Hint: Only ARI needs true labels; others use cluster data alone [OK]
Common Mistakes:
Confusing Silhouette Score as needing true labels
Thinking Davies-Bouldin Index requires true labels
Assuming Calinski-Harabasz Index uses true labels
2. Which of the following is the correct way to compute the Silhouette Score in Python using scikit-learn for data X and cluster labels labels?
easy
A. from sklearn.metrics import silhouette_score
score = silhouette_score(X, labels)
B. from sklearn.cluster import silhouette_score
score = silhouette_score(labels, X)
C. from sklearn.metrics import silhouette_score
score = silhouette_score(labels, X)
D. from sklearn.metrics import silhouette_score
score = silhouette_score(X)
Solution
Step 1: Check import source
Silhouette Score is in sklearn.metrics, not sklearn.cluster.
Step 2: Check function parameters
Function signature is silhouette_score(X, labels), where X is data and labels are cluster assignments.
Final Answer:
from sklearn.metrics import silhouette_score\nscore = silhouette_score(X, labels) -> Option A
Quick Check:
Correct import and parameter order = D [OK]
Hint: Import from metrics and pass data first, labels second [OK]
Common Mistakes:
Importing silhouette_score from sklearn.cluster
Swapping data and labels in function call
Calling silhouette_score with only data
3. Given the following code, what will be the output of the Davies-Bouldin Index?
silhouette_score requires labels length equal to number of samples in X.
Final Answer:
Mismatch in length between X and labels -> Option A
Quick Check:
Length mismatch error = A [OK]
Hint: Ensure labels length matches data samples count [OK]
Common Mistakes:
Thinking silhouette_score needs true labels
Assuming lists instead of arrays cause error
Believing cluster count limits cause error
5. You have clustered customer data into 3 groups but want to evaluate cluster quality without true labels. Which combination of metrics gives the best overall insight?
hard
A. Adjusted Rand Index and Calinski-Harabasz Index
B. Silhouette Score and Davies-Bouldin Index
C. Homogeneity Score and Completeness Score
D. Adjusted Mutual Information and Silhouette Score
Solution
Step 1: Identify metrics that do not require true labels
Silhouette Score and Davies-Bouldin Index are internal metrics needing only data and cluster labels.
Step 2: Understand other metrics need true labels
Adjusted Rand Index, Homogeneity, Completeness, and Adjusted Mutual Information require true labels, which are unavailable.
Final Answer:
Silhouette Score and Davies-Bouldin Index -> Option B
Quick Check:
Internal metrics only = A [OK]
Hint: Use only internal metrics when true labels are missing [OK]