Challenge - 5 Problems

🎖️

K-Means Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Understanding the K-Means Objective

What is the main goal of the K-Means clustering algorithm?

ATo minimize the sum of squared distances between data points and their assigned cluster centers

BTo find the cluster centers that maximize the total variance within clusters

CTo assign each data point to a unique cluster without overlap

DTo maximize the distance between all data points in the dataset

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of K-Means Cluster Assignments

What is the output of the following Python code snippet using scikit-learn's KMeans?

ML Python

from sklearn.cluster import KMeans
import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
print(kmeans.labels_)

A[1 1 1 0 0 0]

B[0 1 0 1 0 1]

C[1 0 1 0 1 0]

D[0 0 0 1 1 1]

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Choosing the Number of Clusters (k)

Which method is commonly used to decide the best number of clusters (k) in K-Means clustering?

AElbow method by plotting within-cluster sum of squares versus k

BUsing the highest possible k to maximize clusters

CChoosing k based on the number of features in the dataset

DRandomly selecting k without evaluation

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

Evaluating K-Means Clustering Quality

Which metric measures how well-separated the clusters are in K-Means clustering?

AMean squared error

BSilhouette score

CAccuracy score

DCross-entropy loss

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Debugging K-Means Convergence Issue

You run K-Means on a dataset but notice the algorithm does not converge and runs indefinitely. Which is the most likely cause?

AThe random_state parameter is not set

BThe dataset contains only numerical features

CThe number of clusters k is set larger than the number of unique data points

DThe data is normalized before clustering

Attempts:

2 left