Challenge - 5 Problems

🎖️

Clustering Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

1:30remaining

Why does clustering group similar data points?

Which of the following best explains why clustering algorithms group similar data points together?

AClustering algorithms group data points based on their proximity in feature space, so points close to each other are grouped together.

BClustering algorithms randomly assign data points to groups without considering their features.

CClustering algorithms group data points by sorting them alphabetically based on their labels.

DClustering algorithms group data points by their order of appearance in the dataset.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of clustering labels using scipy

What is the output labels array after running this clustering code?

SciPy

from scipy.cluster.hierarchy import fcluster, linkage
import numpy as np

# Sample data points
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])

# Perform hierarchical clustering
Z = linkage(X, method='single')

# Form flat clusters with max distance 3
labels = fcluster(Z, t=3, criterion='distance')
print(labels)

A[1 1 2 2 3 3]

B[1 2 3 4 5 6]

C[2 2 2 1 1 1]

D[1 1 1 2 2 2]

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Number of clusters formed with different distance thresholds

Given the same data and linkage matrix, how many clusters are formed when the distance threshold changes?

SciPy

from scipy.cluster.hierarchy import fcluster, linkage
import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
Z = linkage(X, method='single')

clusters_t2 = fcluster(Z, t=2, criterion='distance')
clusters_t5 = fcluster(Z, t=5, criterion='distance')

num_clusters_t2 = len(set(clusters_t2))
num_clusters_t5 = len(set(clusters_t5))
print(num_clusters_t2, num_clusters_t5)

A2 2

B3 2

C3 1

D1 3

Attempts:

2 left

❓ visualization

advanced

1:30remaining

Interpreting a dendrogram for clustering

Which statement correctly describes the dendrogram shown below for hierarchical clustering?

(Imagine a dendrogram with two main branches splitting at a height around 3)

AThe dendrogram shows two main clusters formed when cutting at height 3, grouping similar points together.

BThe dendrogram shows clusters formed by sorting points alphabetically.

CThe dendrogram indicates that no clusters can be formed because all points are too far apart.

DThe dendrogram shows that all points are identical and form one cluster at any height.

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Identify the error in clustering code

What error will this code raise when run?

SciPy

from scipy.cluster.hierarchy import linkage, fcluster
import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])

Z = linkage(X, method='single')

# Incorrect use of fcluster with invalid criterion
labels = fcluster(Z, t=3, criterion='invalid')
print(labels)

ANo error, prints cluster labels

BIndexError: list index out of range

CValueError: criterion must be one of ['inconsistent', 'distance', 'maxclust']

DTypeError: linkage() missing required positional argument

Attempts:

2 left