0
0
ML Pythonprogramming~5 mins

DBSCAN clustering in ML Python

Choose your learning style9 modes available
Introduction

DBSCAN helps find groups in data without needing to know how many groups there are. It also finds points that don't belong to any group.

When you want to find clusters of points that are close together in space.
When you don't know how many clusters to expect in your data.
When your data has noise or outliers that should not be part of any cluster.
When clusters have irregular shapes and sizes.
When you want a simple way to group data points based on density.
Syntax
ML Python
from sklearn.cluster import DBSCAN

model = DBSCAN(eps=0.5, min_samples=5)
model.fit(data)
labels = model.labels_

eps is the maximum distance between two points to be considered neighbors.

min_samples is the minimum number of points needed to form a dense region (cluster).

Examples
Smaller neighborhood radius and more points needed to form a cluster.
ML Python
model = DBSCAN(eps=0.3, min_samples=10)
Larger neighborhood radius and fewer points needed to form a cluster.
ML Python
model = DBSCAN(eps=0.7, min_samples=3)
Using Euclidean distance to measure closeness between points (default).
ML Python
model = DBSCAN(eps=0.5, min_samples=5, metric='euclidean')
Sample Program

This program creates simple data with 3 groups, then uses DBSCAN to find clusters. It prints how many clusters it found, how many points are noise, and shows the cluster labels for the first 10 points.

ML Python
from sklearn.cluster import DBSCAN
from sklearn.datasets import make_blobs
import numpy as np

# Create sample data with 3 centers
X, _ = make_blobs(n_samples=100, centers=3, cluster_std=0.5, random_state=42)

# Create DBSCAN model
model = DBSCAN(eps=0.6, min_samples=5)

# Fit model to data
model.fit(X)

# Get cluster labels
labels = model.labels_

# Count clusters (excluding noise labeled as -1)
n_clusters = len(set(labels)) - (1 if -1 in labels else 0)

# Count noise points
n_noise = list(labels).count(-1)

print(f"Number of clusters found: {n_clusters}")
print(f"Number of noise points: {n_noise}")
print(f"Cluster labels for first 10 points: {labels[:10]}")
OutputSuccess
Important Notes

DBSCAN labels noise points as -1.

Choosing the right eps and min_samples is important for good clusters.

DBSCAN works well with clusters of similar density but may struggle if densities vary a lot.

Summary

DBSCAN groups points based on how close they are and how many neighbors they have.

It finds clusters without needing to know the number of clusters beforehand.

It can identify noise points that don't belong to any cluster.