0
0
SciPydata~20 mins

K-means via scipy vs scikit-learn - Practice Questions

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
K-means Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of K-means clustering with scipy.cluster.vq
What is the output of the following code snippet that uses scipy's kmeans and vq functions?
SciPy
import numpy as np
from scipy.cluster.vq import kmeans, vq

np.random.seed(0)
data = np.vstack((np.random.normal(0, 1, (5, 2)), np.random.normal(5, 1, (5, 2))))
centroids, distortion = kmeans(data, 2)
labels, _ = vq(data, centroids)
print(labels.tolist())
A[0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
B[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
C[0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
D[1, 0, 1, 0, 1, 0, 1, 0, 1, 0]
Attempts:
2 left
💡 Hint
Remember that kmeans clusters points based on proximity to centroids, and vq assigns labels accordingly.
Predict Output
intermediate
2:00remaining
Output of KMeans from scikit-learn with fixed random state
What is the output of the following code that uses scikit-learn's KMeans to cluster the same data?
SciPy
import numpy as np
from sklearn.cluster import KMeans

np.random.seed(0)
data = np.vstack((np.random.normal(0, 1, (5, 2)), np.random.normal(5, 1, (5, 2))))
kmeans = KMeans(n_clusters=2, random_state=0).fit(data)
print(kmeans.labels_.tolist())
A[0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
B[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
C[0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
D[1, 0, 1, 0, 1, 0, 1, 0, 1, 0]
Attempts:
2 left
💡 Hint
scikit-learn's KMeans also clusters points based on proximity and uses random_state for reproducibility.
data_output
advanced
2:00remaining
Number of unique clusters from scipy kmeans labels
After running scipy's kmeans and vq on a dataset with 3 clusters, how many unique cluster labels will the labels array contain?
SciPy
import numpy as np
from scipy.cluster.vq import kmeans, vq

np.random.seed(1)
data = np.vstack((np.random.normal(0, 1, (4, 2)), np.random.normal(5, 1, (4, 2)), np.random.normal(10, 1, (4, 2))))
centroids, _ = kmeans(data, 3)
labels, _ = vq(data, centroids)
unique_labels = len(set(labels))
print(unique_labels)
A4
B2
C3
D1
Attempts:
2 left
💡 Hint
The number of clusters requested is 3, so expect 3 unique labels.
visualization
advanced
3:00remaining
Visual difference between scipy and scikit-learn K-means centroids
Which option correctly describes the difference in centroid positions when clustering the same dataset with scipy.cluster.vq.kmeans and sklearn.cluster.KMeans?
SciPy
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import kmeans
from sklearn.cluster import KMeans

np.random.seed(42)
data = np.vstack((np.random.normal(0, 1, (10, 2)), np.random.normal(5, 1, (10, 2))))

centroids_scipy, _ = kmeans(data, 2)
kmeans_sklearn = KMeans(n_clusters=2, random_state=42).fit(data)
centroids_sklearn = kmeans_sklearn.cluster_centers_

plt.scatter(data[:,0], data[:,1], c='gray', label='Data points')
plt.scatter(centroids_scipy[:,0], centroids_scipy[:,1], c='red', marker='x', s=100, label='Scipy centroids')
plt.scatter(centroids_sklearn[:,0], centroids_sklearn[:,1], c='blue', marker='o', s=100, label='Sklearn centroids')
plt.legend()
plt.title('Centroids from scipy vs sklearn K-means')
plt.show()
ASklearn centroids are always the mean of all data points, while scipy centroids are medians.
BScipy centroids and sklearn centroids overlap exactly because both use the same algorithm and initialization.
CScipy centroids are always closer to the origin than sklearn centroids.
DScipy centroids and sklearn centroids differ slightly because sklearn uses k-means++ initialization and iterative refinement.
Attempts:
2 left
💡 Hint
Consider differences in initialization and algorithm details between scipy and sklearn implementations.
🧠 Conceptual
expert
3:00remaining
Key difference in output between scipy.cluster.vq.kmeans and sklearn.cluster.KMeans
Which statement best describes a key difference in the outputs of scipy.cluster.vq.kmeans and sklearn.cluster.KMeans when applied to the same dataset?
AScipy's kmeans always produces deterministic results, while sklearn's KMeans results vary randomly every run.
BScipy's kmeans returns centroids and distortion, while sklearn's KMeans returns centroids, labels, and inertia with iterative convergence.
CSklearn's KMeans uses hierarchical clustering internally, while scipy's kmeans uses flat clustering.
DScipy's kmeans returns labels directly, while sklearn's KMeans only returns centroids without labels.
Attempts:
2 left
💡 Hint
Think about what each function returns and how the algorithms run.