Challenge - 5 Problems

🎖️

K-means Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of K-means clustering with scipy.cluster.vq

What is the output of the following code snippet that uses scipy's kmeans and vq functions?

SciPy

import numpy as np
from scipy.cluster.vq import kmeans, vq

np.random.seed(0)
data = np.vstack((np.random.normal(0, 1, (5, 2)), np.random.normal(5, 1, (5, 2))))
centroids, distortion = kmeans(data, 2)
labels, _ = vq(data, centroids)
print(labels.tolist())

A[0, 0, 0, 0, 0, 1, 1, 1, 1, 1]

B[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]

C[0, 1, 0, 1, 0, 1, 0, 1, 0, 1]

D[1, 0, 1, 0, 1, 0, 1, 0, 1, 0]

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of KMeans from scikit-learn with fixed random state

What is the output of the following code that uses scikit-learn's KMeans to cluster the same data?

SciPy

import numpy as np
from sklearn.cluster import KMeans

np.random.seed(0)
data = np.vstack((np.random.normal(0, 1, (5, 2)), np.random.normal(5, 1, (5, 2))))
kmeans = KMeans(n_clusters=2, random_state=0).fit(data)
print(kmeans.labels_.tolist())

A[0, 0, 0, 0, 0, 1, 1, 1, 1, 1]

B[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]

C[0, 1, 0, 1, 0, 1, 0, 1, 0, 1]

D[1, 0, 1, 0, 1, 0, 1, 0, 1, 0]

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Number of unique clusters from scipy kmeans labels

After running scipy's kmeans and vq on a dataset with 3 clusters, how many unique cluster labels will the labels array contain?

SciPy

import numpy as np
from scipy.cluster.vq import kmeans, vq

np.random.seed(1)
data = np.vstack((np.random.normal(0, 1, (4, 2)), np.random.normal(5, 1, (4, 2)), np.random.normal(10, 1, (4, 2))))
centroids, _ = kmeans(data, 3)
labels, _ = vq(data, centroids)
unique_labels = len(set(labels))
print(unique_labels)

Attempts:

2 left

❓ visualization

advanced

3:00remaining

Visual difference between scipy and scikit-learn K-means centroids

Which option correctly describes the difference in centroid positions when clustering the same dataset with scipy.cluster.vq.kmeans and sklearn.cluster.KMeans?

SciPy

import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import kmeans
from sklearn.cluster import KMeans

np.random.seed(42)
data = np.vstack((np.random.normal(0, 1, (10, 2)), np.random.normal(5, 1, (10, 2))))

centroids_scipy, _ = kmeans(data, 2)
kmeans_sklearn = KMeans(n_clusters=2, random_state=42).fit(data)
centroids_sklearn = kmeans_sklearn.cluster_centers_

plt.scatter(data[:,0], data[:,1], c='gray', label='Data points')
plt.scatter(centroids_scipy[:,0], centroids_scipy[:,1], c='red', marker='x', s=100, label='Scipy centroids')
plt.scatter(centroids_sklearn[:,0], centroids_sklearn[:,1], c='blue', marker='o', s=100, label='Sklearn centroids')
plt.legend()
plt.title('Centroids from scipy vs sklearn K-means')
plt.show()

ASklearn centroids are always the mean of all data points, while scipy centroids are medians.

BScipy centroids and sklearn centroids overlap exactly because both use the same algorithm and initialization.

CScipy centroids are always closer to the origin than sklearn centroids.

DScipy centroids and sklearn centroids differ slightly because sklearn uses k-means++ initialization and iterative refinement.

Attempts:

2 left

🧠 Conceptual

expert

3:00remaining

Key difference in output between scipy.cluster.vq.kmeans and sklearn.cluster.KMeans

Which statement best describes a key difference in the outputs of scipy.cluster.vq.kmeans and sklearn.cluster.KMeans when applied to the same dataset?

AScipy's kmeans always produces deterministic results, while sklearn's KMeans results vary randomly every run.

BScipy's kmeans returns centroids and distortion, while sklearn's KMeans returns centroids, labels, and inertia with iterative convergence.

CSklearn's KMeans uses hierarchical clustering internally, while scipy's kmeans uses flat clustering.

DScipy's kmeans returns labels directly, while sklearn's KMeans only returns centroids without labels.

Attempts:

2 left