DBSCAN helps find groups in data without needing to know how many groups there are. It also finds points that don't belong to any group.
DBSCAN clustering in ML Python
from sklearn.cluster import DBSCAN model = DBSCAN(eps=0.5, min_samples=5) model.fit(data) labels = model.labels_
eps is the maximum distance between two points to be considered neighbors.
min_samples is the minimum number of points needed to form a dense region (cluster).
model = DBSCAN(eps=0.3, min_samples=10)
model = DBSCAN(eps=0.7, min_samples=3)
model = DBSCAN(eps=0.5, min_samples=5, metric='euclidean')
This program creates simple data with 3 groups, then uses DBSCAN to find clusters. It prints how many clusters it found, how many points are noise, and shows the cluster labels for the first 10 points.
from sklearn.cluster import DBSCAN from sklearn.datasets import make_blobs import numpy as np # Create sample data with 3 centers X, _ = make_blobs(n_samples=100, centers=3, cluster_std=0.5, random_state=42) # Create DBSCAN model model = DBSCAN(eps=0.6, min_samples=5) # Fit model to data model.fit(X) # Get cluster labels labels = model.labels_ # Count clusters (excluding noise labeled as -1) n_clusters = len(set(labels)) - (1 if -1 in labels else 0) # Count noise points n_noise = list(labels).count(-1) print(f"Number of clusters found: {n_clusters}") print(f"Number of noise points: {n_noise}") print(f"Cluster labels for first 10 points: {labels[:10]}")
DBSCAN labels noise points as -1.
Choosing the right eps and min_samples is important for good clusters.
DBSCAN works well with clusters of similar density but may struggle if densities vary a lot.
DBSCAN groups points based on how close they are and how many neighbors they have.
It finds clusters without needing to know the number of clusters beforehand.
It can identify noise points that don't belong to any cluster.