Challenge - 5 Problems
K-means Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of K-means clustering with scipy.cluster.vq
What is the output of the following code snippet that uses scipy's kmeans and vq functions?
SciPy
import numpy as np from scipy.cluster.vq import kmeans, vq np.random.seed(0) data = np.vstack((np.random.normal(0, 1, (5, 2)), np.random.normal(5, 1, (5, 2)))) centroids, distortion = kmeans(data, 2) labels, _ = vq(data, centroids) print(labels.tolist())
Attempts:
2 left
💡 Hint
Remember that kmeans clusters points based on proximity to centroids, and vq assigns labels accordingly.
✗ Incorrect
The first five points are generated around (0,0) and the next five around (5,5). The kmeans function finds two centroids near these centers. The vq function assigns the first five points to cluster 0 and the last five to cluster 1, resulting in labels [0,0,0,0,0,1,1,1,1,1].
❓ Predict Output
intermediate2:00remaining
Output of KMeans from scikit-learn with fixed random state
What is the output of the following code that uses scikit-learn's KMeans to cluster the same data?
SciPy
import numpy as np from sklearn.cluster import KMeans np.random.seed(0) data = np.vstack((np.random.normal(0, 1, (5, 2)), np.random.normal(5, 1, (5, 2)))) kmeans = KMeans(n_clusters=2, random_state=0).fit(data) print(kmeans.labels_.tolist())
Attempts:
2 left
💡 Hint
scikit-learn's KMeans also clusters points based on proximity and uses random_state for reproducibility.
✗ Incorrect
The KMeans model finds two clusters similar to scipy's kmeans. The labels_ attribute assigns the first five points to cluster 0 and the last five to cluster 1, matching the expected output.
❓ data_output
advanced2:00remaining
Number of unique clusters from scipy kmeans labels
After running scipy's kmeans and vq on a dataset with 3 clusters, how many unique cluster labels will the labels array contain?
SciPy
import numpy as np from scipy.cluster.vq import kmeans, vq np.random.seed(1) data = np.vstack((np.random.normal(0, 1, (4, 2)), np.random.normal(5, 1, (4, 2)), np.random.normal(10, 1, (4, 2)))) centroids, _ = kmeans(data, 3) labels, _ = vq(data, centroids) unique_labels = len(set(labels)) print(unique_labels)
Attempts:
2 left
💡 Hint
The number of clusters requested is 3, so expect 3 unique labels.
✗ Incorrect
Since kmeans is run with 3 clusters, the labels assigned by vq will be from 0 to 2, resulting in 3 unique labels.
❓ visualization
advanced3:00remaining
Visual difference between scipy and scikit-learn K-means centroids
Which option correctly describes the difference in centroid positions when clustering the same dataset with scipy.cluster.vq.kmeans and sklearn.cluster.KMeans?
SciPy
import numpy as np import matplotlib.pyplot as plt from scipy.cluster.vq import kmeans from sklearn.cluster import KMeans np.random.seed(42) data = np.vstack((np.random.normal(0, 1, (10, 2)), np.random.normal(5, 1, (10, 2)))) centroids_scipy, _ = kmeans(data, 2) kmeans_sklearn = KMeans(n_clusters=2, random_state=42).fit(data) centroids_sklearn = kmeans_sklearn.cluster_centers_ plt.scatter(data[:,0], data[:,1], c='gray', label='Data points') plt.scatter(centroids_scipy[:,0], centroids_scipy[:,1], c='red', marker='x', s=100, label='Scipy centroids') plt.scatter(centroids_sklearn[:,0], centroids_sklearn[:,1], c='blue', marker='o', s=100, label='Sklearn centroids') plt.legend() plt.title('Centroids from scipy vs sklearn K-means') plt.show()
Attempts:
2 left
💡 Hint
Consider differences in initialization and algorithm details between scipy and sklearn implementations.
✗ Incorrect
Scipy's kmeans uses random initialization and a single run, while sklearn's KMeans uses k-means++ initialization and multiple iterations for refinement, causing slight differences in centroid positions.
🧠 Conceptual
expert3:00remaining
Key difference in output between scipy.cluster.vq.kmeans and sklearn.cluster.KMeans
Which statement best describes a key difference in the outputs of scipy.cluster.vq.kmeans and sklearn.cluster.KMeans when applied to the same dataset?
Attempts:
2 left
💡 Hint
Think about what each function returns and how the algorithms run.
✗ Incorrect
Scipy's kmeans returns centroids and a distortion measure but does not assign labels directly; labels are assigned separately with vq. Sklearn's KMeans returns centroids, labels, and inertia after iterative convergence with multiple runs.