Challenge - 5 Problems

🎖️

Cluster Metrics Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of silhouette_score for simple clusters

What is the output of the following code that calculates the silhouette score for two simple clusters?

SciPy

from sklearn.metrics import silhouette_score
from sklearn.datasets import make_blobs

X, labels = make_blobs(n_samples=10, centers=2, cluster_std=0.5, random_state=42)
score = silhouette_score(X, labels)
print(round(score, 2))

A0.71

B0.15

C1.00

D-0.50

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Number of clusters from DBSCAN labels

Given the following DBSCAN clustering labels, how many clusters (excluding noise) are detected?

SciPy

import numpy as np
labels = np.array([0, 0, 1, 1, -1, 2, 2, -1, 2])
num_clusters = len(set(labels)) - (1 if -1 in labels else 0)
print(num_clusters)

Attempts:

2 left

🧠 Conceptual

advanced

1:30remaining

Understanding Adjusted Rand Index (ARI)

Which statement correctly describes the Adjusted Rand Index (ARI) in cluster evaluation?

AARI calculates the ratio of within-cluster variance to total variance.

BARI measures the average distance between cluster centroids and data points.

CARI is a metric that only works for hierarchical clustering methods.

DARI measures similarity between two clusterings, adjusted for chance, with values from -1 to 1 where 1 means perfect match.

Attempts:

2 left

❓ visualization

advanced

1:30remaining

Interpreting a silhouette plot

You run a silhouette plot for a clustering result with 3 clusters. Which of the following interpretations is correct if one cluster has many negative silhouette values?

ANegative silhouette values mean the clustering algorithm perfectly assigned points.

BThe cluster with negative silhouette values is the most compact and well-separated cluster.

CThe cluster with many negative silhouette values is poorly separated and may be overlapping with other clusters.

DNegative silhouette values indicate that the cluster has the highest density of points.

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Identify the error in Calinski-Harabasz score calculation

What error will the following code raise when calculating the Calinski-Harabasz score?

SciPy

from sklearn.metrics import calinski_harabasz_score
X = [[1, 2], [3, 4], [5, 6]]
labels = [0, 1]
score = calinski_harabasz_score(X, labels)
print(score)

AIndexError: list index out of range

BValueError: Number of labels does not match number of samples

CTypeError: calinski_harabasz_score() missing required positional argument

DNo error, prints a float score

Attempts:

2 left