0
0
ML Pythonml~20 mins

Mean shift clustering in ML Python - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Mean shift clustering
Problem:You want to group data points into clusters without knowing the number of clusters beforehand. You use mean shift clustering on a 2D dataset.
Current Metrics:The model groups data but creates too many small clusters, making it hard to interpret. Cluster count is 15, but expected is around 3-5.
Issue:The bandwidth parameter is too small, causing over-segmentation and many tiny clusters.
Your Task
Adjust the mean shift clustering bandwidth to reduce the number of clusters to between 3 and 5, improving cluster quality and interpretability.
You can only change the bandwidth parameter.
Do not change the dataset or the clustering algorithm.
Hint 1
Hint 2
Hint 3
Solution
ML Python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import MeanShift, estimate_bandwidth

# Generate sample data
np.random.seed(42)
cluster_centers = [[1, 1], [5, 5], [9, 1]]
data = []
for center in cluster_centers:
    data.append(np.random.randn(50, 2) + center)
data = np.vstack(data)

# Estimate bandwidth
bandwidth = estimate_bandwidth(data, quantile=0.3)

# Apply Mean Shift with adjusted bandwidth
ms = MeanShift(bandwidth=bandwidth)
ms.fit(data)
labels = ms.labels_
cluster_centers = ms.cluster_centers_

# Plot results
plt.figure(figsize=(8, 6))
colors = ['r', 'g', 'b', 'y', 'c', 'm']
for k in range(len(cluster_centers)):
    cluster_data = data[labels == k]
    plt.scatter(cluster_data[:, 0], cluster_data[:, 1], c=colors[k % len(colors)], label=f'Cluster {k}')
plt.scatter(cluster_centers[:, 0], cluster_centers[:, 1], c='k', marker='x', s=100, label='Centers')
plt.title(f'Mean Shift Clustering with bandwidth={bandwidth:.2f}')
plt.legend()
plt.show()

# Print number of clusters
print(f'Number of clusters: {len(cluster_centers)}')
Used sklearn's estimate_bandwidth with quantile=0.3 to find a better bandwidth value.
Set the MeanShift bandwidth parameter to the estimated value instead of default.
This merges small clusters into larger meaningful clusters.
Results Interpretation

Before: 15 clusters, many small groups, hard to interpret.

After: 3 clusters, matching expected groups, clearer cluster centers.

Adjusting the bandwidth in mean shift clustering controls cluster size. A larger bandwidth merges close points into fewer clusters, reducing over-segmentation.
Bonus Experiment
Try using a smaller bandwidth than the estimated one and observe how the number of clusters changes.
💡 Hint
Decrease the quantile parameter in estimate_bandwidth to get a smaller bandwidth and see if clusters split more.