Practice

(1/5)

1. Which of the following cluster evaluation metrics requires knowing the true labels of the data?

easy

A. Davies-Bouldin Index

B. Silhouette Score

C. Adjusted Rand Index (ARI)

D. Calinski-Harabasz Index

Solution

Step 1: Understand metric types
Some cluster metrics need true labels (external metrics), others only use cluster assignments (internal metrics).
Step 2: Identify ARI as external metric
Adjusted Rand Index compares predicted clusters to true labels, so it requires true labels.
Final Answer:
Adjusted Rand Index (ARI) -> Option C
Quick Check:
External metric = ARI [OK]

Hint: Only ARI needs true labels; others use cluster data alone [OK]

Common Mistakes:

Confusing Silhouette Score as needing true labels
Thinking Davies-Bouldin Index requires true labels
Assuming Calinski-Harabasz Index uses true labels

2. Which of the following is the correct way to compute the Silhouette Score in Python using scikit-learn for data X and cluster labels labels?

easy

A. from sklearn.metrics import silhouette_score score = silhouette_score(X, labels)

B. from sklearn.cluster import silhouette_score score = silhouette_score(labels, X)

C. from sklearn.metrics import silhouette_score score = silhouette_score(labels, X)

D. from sklearn.metrics import silhouette_score score = silhouette_score(X)

Solution

Step 1: Check import source
Silhouette Score is in sklearn.metrics, not sklearn.cluster.
Step 2: Check function parameters
Function signature is silhouette_score(X, labels), where X is data and labels are cluster assignments.
Final Answer:
from sklearn.metrics import silhouette_score\nscore = silhouette_score(X, labels) -> Option A
Quick Check:
Correct import and parameter order = D [OK]

Hint: Import from metrics and pass data first, labels second [OK]

Common Mistakes:

Importing silhouette_score from sklearn.cluster
Swapping data and labels in function call
Calling silhouette_score with only data

3. Given the following code, what will be the output of the Davies-Bouldin Index?

from sklearn.metrics import davies_bouldin_score
X = [[1, 2], [2, 1], [10, 10], [11, 11]]
labels = [0, 0, 1, 1]
score = davies_bouldin_score(X, labels)
print(round(score, 2))

medium

A. 0.50

B. 1.41

C. 1.00

D. 0.11

Solution

Step 1: Understand Davies-Bouldin Index meaning
Lower values mean better clusters; it measures average similarity between clusters.
Step 2: Calculate score using sklearn
Running the code gives approximately 0.1111, rounded to 0.11.
Final Answer:
0.11 -> Option D
Quick Check:
Davies-Bouldin score ≈ 0.11 [OK]

Hint: Run sklearn function and round result to 2 decimals [OK]

Common Mistakes:

Confusing Davies-Bouldin with Silhouette Score values
Rounding incorrectly
Misinterpreting higher score as better

4. The following code throws an error. What is the most likely cause?

from sklearn.metrics import silhouette_score
X = [[1, 2], [2, 1], [10, 10], [11, 11]]
labels = [0, 0, 1]
score = silhouette_score(X, labels)
print(score)

medium

A. Mismatch in length between X and labels

B. silhouette_score requires true labels, not cluster labels

C. X should be a numpy array, not a list

D. silhouette_score cannot handle more than 3 clusters

Solution

Step 1: Check input lengths
Data X has 4 samples, but labels list has only 3 elements, causing mismatch error.
Step 2: Understand silhouette_score input requirements
silhouette_score requires labels length equal to number of samples in X.
Final Answer:
Mismatch in length between X and labels -> Option A
Quick Check:
Length mismatch error = A [OK]

Hint: Ensure labels length matches data samples count [OK]

Common Mistakes:

Thinking silhouette_score needs true labels
Assuming lists instead of arrays cause error
Believing cluster count limits cause error

5. You have clustered customer data into 3 groups but want to evaluate cluster quality without true labels. Which combination of metrics gives the best overall insight?

hard

A. Adjusted Rand Index and Calinski-Harabasz Index

B. Silhouette Score and Davies-Bouldin Index

C. Homogeneity Score and Completeness Score

D. Adjusted Mutual Information and Silhouette Score

Solution

Step 1: Identify metrics that do not require true labels
Silhouette Score and Davies-Bouldin Index are internal metrics needing only data and cluster labels.
Step 2: Understand other metrics need true labels
Adjusted Rand Index, Homogeneity, Completeness, and Adjusted Mutual Information require true labels, which are unavailable.
Final Answer:
Silhouette Score and Davies-Bouldin Index -> Option B
Quick Check:
Internal metrics only = A [OK]

Hint: Use only internal metrics when true labels are missing [OK]

Common Mistakes:

Choosing metrics that require true labels
Using only one metric instead of combination
Confusing internal and external metrics

Cluster evaluation metrics in ML Python - Model Pipeline Trace

Start learning this pattern below

Practice

Solution

Step 1: Understand metric types

Step 2: Identify ARI as external metric

Final Answer:

Quick Check:

Solution

Step 1: Check import source

Step 2: Check function parameters

Final Answer:

Quick Check:

Solution

Step 1: Understand Davies-Bouldin Index meaning

Step 2: Calculate score using sklearn

Final Answer:

Quick Check:

Solution

Step 1: Check input lengths

Step 2: Understand silhouette_score input requirements

Final Answer:

Quick Check:

Solution

Step 1: Identify metrics that do not require true labels

Step 2: Understand other metrics need true labels

Final Answer:

Quick Check: