0
0
SciPydata~5 mins

Why clustering groups similar data in SciPy

Choose your learning style9 modes available
Introduction

Clustering helps us find groups of things that are alike. It makes big data easier to understand by putting similar items together.

Grouping customers with similar buying habits to offer better deals.
Organizing photos by similar features like colors or shapes.
Finding groups of similar documents or articles.
Detecting patterns in sensor data from machines to spot issues.
Segmenting users on a website based on their behavior.
Syntax
SciPy
from scipy.cluster.vq import kmeans, vq

# data is a 2D array of points
centroids, distortion = kmeans(data, number_of_clusters)
cluster_labels, _ = vq(data, centroids)

kmeans finds the center points (centroids) of clusters.

vq assigns each data point to the nearest centroid.

Examples
This example groups 6 points into 2 clusters and prints which cluster each point belongs to.
SciPy
from scipy.cluster.vq import kmeans, vq
import numpy as np

data = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
centroids, _ = kmeans(data, 2)
labels, _ = vq(data, centroids)
print(labels)
This finds 3 cluster centers for 10 random 2D points.
SciPy
from scipy.cluster.vq import kmeans
import numpy as np

data = np.random.rand(10, 2)
centroids, distortion = kmeans(data, 3)
print('Centroids:', centroids)
Sample Program

This program groups six points into two clusters using k-means clustering from scipy. It prints the cluster centers and which cluster each point belongs to.

SciPy
from scipy.cluster.vq import kmeans, vq
import numpy as np

# Sample data: points in 2D space
points = np.array([
    [1, 2], [1, 4], [1, 0],
    [10, 2], [10, 4], [10, 0]
])

# Find 2 clusters
centroids, distortion = kmeans(points, 2)

# Assign points to clusters
labels, _ = vq(points, centroids)

print('Centroids:')
print(centroids)
print('Cluster labels for each point:')
print(labels)
OutputSuccess
Important Notes

Clustering groups data by measuring how close points are to each other.

Choosing the number of clusters is important and depends on your data.

Scipy's kmeans works well for simple clustering tasks.

Summary

Clustering finds groups of similar data points.

Scipy's kmeans finds cluster centers, and vq assigns points to clusters.

This helps organize and understand data better.