0
0
SciPydata~20 mins

Cluster evaluation metrics in SciPy - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Cluster Metrics Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of silhouette_score for simple clusters
What is the output of the following code that calculates the silhouette score for two simple clusters?
SciPy
from sklearn.metrics import silhouette_score
from sklearn.datasets import make_blobs

X, labels = make_blobs(n_samples=10, centers=2, cluster_std=0.5, random_state=42)
score = silhouette_score(X, labels)
print(round(score, 2))
A0.71
B0.15
C1.00
D-0.50
Attempts:
2 left
💡 Hint
Silhouette score ranges from -1 to 1, higher means better cluster separation.
data_output
intermediate
1:30remaining
Number of clusters from DBSCAN labels
Given the following DBSCAN clustering labels, how many clusters (excluding noise) are detected?
SciPy
import numpy as np
labels = np.array([0, 0, 1, 1, -1, 2, 2, -1, 2])
num_clusters = len(set(labels)) - (1 if -1 in labels else 0)
print(num_clusters)
A3
B2
C4
D5
Attempts:
2 left
💡 Hint
Noise points are labeled as -1 and should not be counted as clusters.
🧠 Conceptual
advanced
1:30remaining
Understanding Adjusted Rand Index (ARI)
Which statement correctly describes the Adjusted Rand Index (ARI) in cluster evaluation?
AARI calculates the ratio of within-cluster variance to total variance.
BARI measures the average distance between cluster centroids and data points.
CARI is a metric that only works for hierarchical clustering methods.
DARI measures similarity between two clusterings, adjusted for chance, with values from -1 to 1 where 1 means perfect match.
Attempts:
2 left
💡 Hint
Think about what ARI compares and its value range.
visualization
advanced
1:30remaining
Interpreting a silhouette plot
You run a silhouette plot for a clustering result with 3 clusters. Which of the following interpretations is correct if one cluster has many negative silhouette values?
ANegative silhouette values mean the clustering algorithm perfectly assigned points.
BThe cluster with negative silhouette values is the most compact and well-separated cluster.
CThe cluster with many negative silhouette values is poorly separated and may be overlapping with other clusters.
DNegative silhouette values indicate that the cluster has the highest density of points.
Attempts:
2 left
💡 Hint
Recall what negative silhouette values mean about point assignment.
🔧 Debug
expert
2:00remaining
Identify the error in Calinski-Harabasz score calculation
What error will the following code raise when calculating the Calinski-Harabasz score?
SciPy
from sklearn.metrics import calinski_harabasz_score
X = [[1, 2], [3, 4], [5, 6]]
labels = [0, 1]
score = calinski_harabasz_score(X, labels)
print(score)
AIndexError: list index out of range
BValueError: Number of labels does not match number of samples
CTypeError: calinski_harabasz_score() missing required positional argument
DNo error, prints a float score
Attempts:
2 left
💡 Hint
Check if the labels list length matches the number of samples in X.