0
0
ML Pythonml~20 mins

Mean shift clustering in ML Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Mean Shift Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding the core idea of Mean Shift Clustering

What is the main principle behind the Mean Shift clustering algorithm?

AIt partitions data by minimizing the sum of squared distances to cluster centers initialized randomly.
BIt builds a hierarchical tree of clusters by merging closest pairs until one cluster remains.
CIt assigns points to clusters based on a fixed radius without updating cluster centers.
DIt iteratively shifts each data point towards the average of points in its neighborhood to find dense regions.
Attempts:
2 left
💡 Hint

Think about how the algorithm finds clusters by moving points towards areas with many neighbors.

Predict Output
intermediate
2:00remaining
Output of Mean Shift cluster centers

Given the following 2D points and bandwidth, what will be the approximate cluster centers found by Mean Shift?

ML Python
import numpy as np
from sklearn.cluster import MeanShift

points = np.array([[1, 2], [1, 3], [2, 2], [8, 8], [8, 9], [9, 8]])
ms = MeanShift(bandwidth=2)
ms.fit(points)
centers = ms.cluster_centers_
print(centers)
A
[[2.0 3.0]
 [9.0 9.0]]
B
[[1.0 2.0]
 [8.0 8.0]
 [9.0 9.0]]
C
[[1.33333333 2.33333333]
 [8.33333333 8.33333333]]
D
[[1.5 2.5]
 [8.5 8.5]
 [9.5 9.5]]
Attempts:
2 left
💡 Hint

Mean Shift averages points close to each other within the bandwidth to find centers.

Hyperparameter
advanced
2:00remaining
Effect of bandwidth on Mean Shift clustering

What happens if you increase the bandwidth parameter in Mean Shift clustering?

AClusters become fewer and larger because the neighborhood radius grows, merging more points.
BClusters become more numerous and smaller because points are grouped more tightly.
CThe algorithm runs faster but produces the same number of clusters.
DThe algorithm ignores points outside the bandwidth and creates empty clusters.
Attempts:
2 left
💡 Hint

Think about how a bigger neighborhood affects grouping points.

Metrics
advanced
2:00remaining
Choosing a metric to evaluate Mean Shift clustering

Which metric is most appropriate to evaluate the quality of clusters produced by Mean Shift when true labels are unknown?

ASilhouette Score, because it measures how similar points are within clusters compared to other clusters.
BAccuracy, because it compares predicted labels to true labels directly.
CMean Squared Error, because it measures distance from points to cluster centers.
DCross-Entropy Loss, because it measures classification error probabilities.
Attempts:
2 left
💡 Hint

Think about metrics that work without knowing the true groupings.

🔧 Debug
expert
2:00remaining
Debugging Mean Shift convergence issue

Consider this code snippet using Mean Shift. It runs but the cluster centers do not change after the first iteration, causing poor clustering. What is the likely cause?

ML Python
import numpy as np
from sklearn.cluster import MeanShift

points = np.array([[0, 0], [0, 1], [1, 0], [10, 10], [10, 11], [11, 10]])
ms = MeanShift(bandwidth=0.5)
ms.fit(points)
print(ms.cluster_centers_)
AThe points array is not normalized, causing the algorithm to fail silently.
BThe bandwidth is too small, so points do not have neighbors to shift towards, causing no movement.
CMean Shift requires labels to be provided, which are missing here.
DThe fit method was called on the wrong variable 'ms' instead of 'ms'.
Attempts:
2 left
💡 Hint

Check if the bandwidth covers enough points to allow shifting.