K Means Clustering in Python: What It Is and How to Use It
K Means clustering in Python is a method to group data points into clusters based on similarity using the sklearn library. It assigns each point to one of k clusters by minimizing the distance to the cluster centers.How It Works
K Means clustering works like sorting a bunch of mixed colored balls into k boxes where each box holds balls of similar color. It starts by picking k random points as centers (called centroids). Then, it assigns each data point to the nearest center, forming groups.
Next, it recalculates the centers by averaging all points in each group. This process repeats: reassign points to the nearest center and update centers until the groups stop changing. The goal is to make points in the same group as similar as possible, like friends sitting close together.
Example
This example shows how to use KMeans from sklearn.cluster to group simple 2D points into 3 clusters.
from sklearn.cluster import KMeans import numpy as np # Sample data: 8 points with 2 features each X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0], [5, 5], [6, 6]]) # Create KMeans with 3 clusters kmeans = KMeans(n_clusters=3, random_state=42) # Fit model to data kmeans.fit(X) # Print cluster centers print('Cluster centers:') print(kmeans.cluster_centers_) # Print cluster labels for each point print('Labels:') print(kmeans.labels_)
When to Use
Use k means clustering when you want to find natural groups in your data without knowing the groups beforehand. It works well for tasks like customer segmentation, grouping similar documents, or organizing images by features.
It is best for numeric data where clusters are roughly round and similar in size. Avoid it if clusters have complex shapes or very different sizes.
Key Points
- K Means groups data into
kclusters by minimizing distance to cluster centers. - It repeats assigning points and updating centers until stable.
- Requires you to choose the number of clusters
kin advance. - Works best for simple, numeric, and well-separated clusters.
- Implemented in Python with
sklearn.cluster.KMeans.