0
0
ML Pythonml~5 mins

Mean shift clustering in ML Python

Choose your learning style9 modes available
Introduction

Mean shift clustering helps find groups in data without guessing how many groups there are. It moves points toward areas with many neighbors to find centers.

When you want to find natural groups in data without deciding the number of groups first.
When you have data points spread in space and want to find dense areas.
When you want to detect clusters of different shapes and sizes.
When you want a simple way to find cluster centers based on data density.
Syntax
ML Python
from sklearn.cluster import MeanShift

model = MeanShift(bandwidth=some_value)
model.fit(data)
labels = model.labels_
cluster_centers = model.cluster_centers_

bandwidth controls the size of the area to look for neighbors. Smaller means more clusters, bigger means fewer.

After fitting, labels_ gives the cluster number for each point, and cluster_centers_ gives the center points of clusters.

Examples
Creates a MeanShift model with bandwidth 2 and fits it to data.
ML Python
from sklearn.cluster import MeanShift
model = MeanShift(bandwidth=2)
model.fit(data)
Gets the cluster labels for each data point after fitting.
ML Python
labels = model.labels_
print(labels)
Prints the coordinates of cluster centers found by the model.
ML Python
centers = model.cluster_centers_
print(centers)
Sample Model

This program creates some points grouped around three centers. It uses MeanShift clustering to find these groups and prints the labels and centers.

ML Python
from sklearn.cluster import MeanShift
import numpy as np

# Sample data: points around (1,1), (5,5), and (9,9)
data = np.array([
    [1, 2], [2, 1], [1, 1],
    [5, 5], [6, 5], [5, 6],
    [9, 9], [8, 9], [9, 8]
])

# Create MeanShift model with bandwidth 2
model = MeanShift(bandwidth=2)
model.fit(data)

# Get cluster labels and centers
labels = model.labels_
centers = model.cluster_centers_

print("Cluster labels:", labels)
print("Cluster centers:", centers)
OutputSuccess
Important Notes

Choosing the right bandwidth is important: too small creates many tiny clusters, too large merges clusters.

Mean shift can be slower on large datasets because it looks at neighbors for each point.

Summary

Mean shift clustering finds groups by moving points toward dense areas.

It does not need you to set the number of clusters beforehand.

Bandwidth controls how big the neighborhood is when finding clusters.