Bird
Raised Fist0
ML Pythonml~20 mins

Cluster evaluation metrics in ML Python - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Cluster Evaluation Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding Silhouette Score Interpretation
Which statement best describes what a Silhouette Score close to +1 indicates about a clustering result?
AClusters have many outliers and points are far from cluster centers
BClusters are overlapping heavily and points are assigned randomly
CClusters are well separated and points are appropriately assigned to their clusters
DClusters are too small and have very few points
Attempts:
2 left
💡 Hint
Think about what a high Silhouette Score means for the distance between points and their own cluster versus other clusters.
Predict Output
intermediate
2:00remaining
Output of Adjusted Rand Index Calculation
What is the output of the following Python code snippet using sklearn.metrics.adjusted_rand_score?
ML Python
from sklearn.metrics import adjusted_rand_score
labels_true = [0, 0, 1, 1, 2, 2]
labels_pred = [1, 1, 0, 0, 2, 2]
score = adjusted_rand_score(labels_true, labels_pred)
print(round(score, 2))
A1.0
B0.0
C-1.0
D0.5
Attempts:
2 left
💡 Hint
Adjusted Rand Index is symmetric and equals 1 when clusterings are identical up to label permutation.
Model Choice
advanced
2:00remaining
Choosing the Best Metric for Clustering with Unknown Labels
You have unlabeled data and want to evaluate your clustering algorithm's quality. Which metric is most appropriate?
AAdjusted Rand Index
BSilhouette Score
CNormalized Mutual Information
DAccuracy
Attempts:
2 left
💡 Hint
Consider which metrics require true labels and which do not.
Metrics
advanced
2:00remaining
Interpreting Davies-Bouldin Index Values
Which of the following statements about the Davies-Bouldin Index (DBI) is true?
ALower DBI values indicate better clustering with compact and well-separated clusters
BHigher DBI values indicate better clustering quality
CDBI values range from 0 to 1, where 1 is perfect clustering
DDBI is only valid for binary clustering problems
Attempts:
2 left
💡 Hint
Think about what a low versus high DBI means for cluster similarity and separation.
🔧 Debug
expert
2:00remaining
Identifying the Error in Cluster Evaluation Code
What error will the following code raise when executed?
ML Python
from sklearn.metrics import silhouette_score
X = [[1, 2], [2, 3], [10, 10], [11, 11]]
labels = [0, 0, 1]
score = silhouette_score(X, labels)
print(score)
ANo error, prints a float score
BTypeError: silhouette_score() missing required positional argument
CIndexError: list index out of range
DValueError: Number of labels does not match number of samples
Attempts:
2 left
💡 Hint
Check if the number of labels matches the number of data points.

Practice

(1/5)
1. Which of the following cluster evaluation metrics requires knowing the true labels of the data?
easy
A. Davies-Bouldin Index
B. Silhouette Score
C. Adjusted Rand Index (ARI)
D. Calinski-Harabasz Index

Solution

  1. Step 1: Understand metric types

    Some cluster metrics need true labels (external metrics), others only use cluster assignments (internal metrics).
  2. Step 2: Identify ARI as external metric

    Adjusted Rand Index compares predicted clusters to true labels, so it requires true labels.
  3. Final Answer:

    Adjusted Rand Index (ARI) -> Option C
  4. Quick Check:

    External metric = ARI [OK]
Hint: Only ARI needs true labels; others use cluster data alone [OK]
Common Mistakes:
  • Confusing Silhouette Score as needing true labels
  • Thinking Davies-Bouldin Index requires true labels
  • Assuming Calinski-Harabasz Index uses true labels
2. Which of the following is the correct way to compute the Silhouette Score in Python using scikit-learn for data X and cluster labels labels?
easy
A. from sklearn.metrics import silhouette_score score = silhouette_score(X, labels)
B. from sklearn.cluster import silhouette_score score = silhouette_score(labels, X)
C. from sklearn.metrics import silhouette_score score = silhouette_score(labels, X)
D. from sklearn.metrics import silhouette_score score = silhouette_score(X)

Solution

  1. Step 1: Check import source

    Silhouette Score is in sklearn.metrics, not sklearn.cluster.
  2. Step 2: Check function parameters

    Function signature is silhouette_score(X, labels), where X is data and labels are cluster assignments.
  3. Final Answer:

    from sklearn.metrics import silhouette_score\nscore = silhouette_score(X, labels) -> Option A
  4. Quick Check:

    Correct import and parameter order = D [OK]
Hint: Import from metrics and pass data first, labels second [OK]
Common Mistakes:
  • Importing silhouette_score from sklearn.cluster
  • Swapping data and labels in function call
  • Calling silhouette_score with only data
3. Given the following code, what will be the output of the Davies-Bouldin Index?
from sklearn.metrics import davies_bouldin_score
X = [[1, 2], [2, 1], [10, 10], [11, 11]]
labels = [0, 0, 1, 1]
score = davies_bouldin_score(X, labels)
print(round(score, 2))
medium
A. 0.50
B. 1.41
C. 1.00
D. 0.11

Solution

  1. Step 1: Understand Davies-Bouldin Index meaning

    Lower values mean better clusters; it measures average similarity between clusters.
  2. Step 2: Calculate score using sklearn

    Running the code gives approximately 0.1111, rounded to 0.11.
  3. Final Answer:

    0.11 -> Option D
  4. Quick Check:

    Davies-Bouldin score ≈ 0.11 [OK]
Hint: Run sklearn function and round result to 2 decimals [OK]
Common Mistakes:
  • Confusing Davies-Bouldin with Silhouette Score values
  • Rounding incorrectly
  • Misinterpreting higher score as better
4. The following code throws an error. What is the most likely cause?
from sklearn.metrics import silhouette_score
X = [[1, 2], [2, 1], [10, 10], [11, 11]]
labels = [0, 0, 1]
score = silhouette_score(X, labels)
print(score)
medium
A. Mismatch in length between X and labels
B. silhouette_score requires true labels, not cluster labels
C. X should be a numpy array, not a list
D. silhouette_score cannot handle more than 3 clusters

Solution

  1. Step 1: Check input lengths

    Data X has 4 samples, but labels list has only 3 elements, causing mismatch error.
  2. Step 2: Understand silhouette_score input requirements

    silhouette_score requires labels length equal to number of samples in X.
  3. Final Answer:

    Mismatch in length between X and labels -> Option A
  4. Quick Check:

    Length mismatch error = A [OK]
Hint: Ensure labels length matches data samples count [OK]
Common Mistakes:
  • Thinking silhouette_score needs true labels
  • Assuming lists instead of arrays cause error
  • Believing cluster count limits cause error
5. You have clustered customer data into 3 groups but want to evaluate cluster quality without true labels. Which combination of metrics gives the best overall insight?
hard
A. Adjusted Rand Index and Calinski-Harabasz Index
B. Silhouette Score and Davies-Bouldin Index
C. Homogeneity Score and Completeness Score
D. Adjusted Mutual Information and Silhouette Score

Solution

  1. Step 1: Identify metrics that do not require true labels

    Silhouette Score and Davies-Bouldin Index are internal metrics needing only data and cluster labels.
  2. Step 2: Understand other metrics need true labels

    Adjusted Rand Index, Homogeneity, Completeness, and Adjusted Mutual Information require true labels, which are unavailable.
  3. Final Answer:

    Silhouette Score and Davies-Bouldin Index -> Option B
  4. Quick Check:

    Internal metrics only = A [OK]
Hint: Use only internal metrics when true labels are missing [OK]
Common Mistakes:
  • Choosing metrics that require true labels
  • Using only one metric instead of combination
  • Confusing internal and external metrics