Challenge - 5 Problems
Cluster Metrics Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of silhouette_score for simple clusters
What is the output of the following code that calculates the silhouette score for two simple clusters?
SciPy
from sklearn.metrics import silhouette_score from sklearn.datasets import make_blobs X, labels = make_blobs(n_samples=10, centers=2, cluster_std=0.5, random_state=42) score = silhouette_score(X, labels) print(round(score, 2))
Attempts:
2 left
💡 Hint
Silhouette score ranges from -1 to 1, higher means better cluster separation.
✗ Incorrect
The silhouette score measures how well samples are clustered. For well-separated clusters, the score is close to 1. Here, the score is about 0.71 indicating good separation.
❓ data_output
intermediate1:30remaining
Number of clusters from DBSCAN labels
Given the following DBSCAN clustering labels, how many clusters (excluding noise) are detected?
SciPy
import numpy as np labels = np.array([0, 0, 1, 1, -1, 2, 2, -1, 2]) num_clusters = len(set(labels)) - (1 if -1 in labels else 0) print(num_clusters)
Attempts:
2 left
💡 Hint
Noise points are labeled as -1 and should not be counted as clusters.
✗ Incorrect
The labels array has clusters labeled 0, 1, and 2. Noise points are -1 and excluded. So total clusters = 3.
🧠 Conceptual
advanced1:30remaining
Understanding Adjusted Rand Index (ARI)
Which statement correctly describes the Adjusted Rand Index (ARI) in cluster evaluation?
Attempts:
2 left
💡 Hint
Think about what ARI compares and its value range.
✗ Incorrect
ARI compares two clusterings by counting pairs of points assigned consistently, adjusting for random chance. It ranges from -1 (bad) to 1 (perfect).
❓ visualization
advanced1:30remaining
Interpreting a silhouette plot
You run a silhouette plot for a clustering result with 3 clusters. Which of the following interpretations is correct if one cluster has many negative silhouette values?
Attempts:
2 left
💡 Hint
Recall what negative silhouette values mean about point assignment.
✗ Incorrect
Negative silhouette values mean points are closer to other clusters than their own, indicating poor separation or misclassification.
🔧 Debug
expert2:00remaining
Identify the error in Calinski-Harabasz score calculation
What error will the following code raise when calculating the Calinski-Harabasz score?
SciPy
from sklearn.metrics import calinski_harabasz_score X = [[1, 2], [3, 4], [5, 6]] labels = [0, 1] score = calinski_harabasz_score(X, labels) print(score)
Attempts:
2 left
💡 Hint
Check if the labels list length matches the number of samples in X.
✗ Incorrect
The labels list has length 2 but X has 3 samples. This mismatch causes a ValueError.