SciPydata~10 mins

Why clustering groups similar data in SciPy - Visual Breakdown

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Why clustering groups similar data

Start with data points

↓

Calculate distances between points

↓

Group points close to each other

↓

Form clusters of similar points

↓

Output clusters

↓

End

Clustering starts with data points, measures how close they are, groups close points, and forms clusters of similar data.

Execution Sample

SciPy

import numpy as np
from scipy.cluster.vq import kmeans, vq

points = np.array([[1,2],[1,4],[1,0],[10,2],[10,4],[10,0]])
centroids,_ = kmeans(points, 2)
cluster_labels, _ = vq(points, centroids)

This code groups 6 points into 2 clusters using k-means clustering.

Execution Table

Step	Action	Details	Result
1	Input data points	6 points in 2D space	[[1,2],[1,4],[1,0],[10,2],[10,4],[10,0]]
2	Calculate initial centroids	Random or first guess	Centroids approx. [[1,2],[10,2]]
3	Assign points to nearest centroid	Distance measured	Points 0,1,2 -> cluster 0; Points 3,4,5 -> cluster 1
4	Recalculate centroids	Mean of points in each cluster	Centroid 0: [1,2]; Centroid 1: [10,2]
5	Assign points again	Check if clusters change	Same assignment as step 3
6	Converged	Clusters stable	Final clusters formed
7	Output cluster labels	Each point's cluster	[0,0,0,1,1,1]

💡 Clusters stable, no change in assignments

Variable Tracker

Variable	Start	After Step 3	After Step 4	After Step 5	Final
points	[[1,2],[1,4],[1,0],[10,2],[10,4],[10,0]]	[[1,2],[1,4],[1,0],[10,2],[10,4],[10,0]]	[[1,2],[1,4],[1,0],[10,2],[10,4],[10,0]]	[[1,2],[1,4],[1,0],[10,2],[10,4],[10,0]]	[[1,2],[1,4],[1,0],[10,2],[10,4],[10,0]]
centroids	random or initial guess	[[1,2],[10,2]]	[[1,2],[10,2]]	[[1,2],[10,2]]	[[1,2],[10,2]]
cluster_labels	none	[0,0,0,1,1,1]	[0,0,0,1,1,1]	[0,0,0,1,1,1]	[0,0,0,1,1,1]

Key Moments - 3 Insights

Why do points close to each other get the same cluster label?

Why do centroids change during clustering?

When does the clustering process stop?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table, what cluster label does the point [10,4] get after step 3?

DNone

Concept Snapshot

Clustering groups data by similarity.
It measures distances between points.
Points close together form clusters.
Centroids represent cluster centers.
Clusters update until stable.
Output labels show group membership.

Full Transcript

Clustering is a way to group data points that are similar or close to each other. We start with data points and calculate distances between them. Then, we assign points to clusters based on which cluster center, called centroid, is nearest. After assigning, we update the centroids to be the average of points in each cluster. This process repeats until the clusters do not change anymore. The final output shows which cluster each point belongs to. This helps us understand patterns in data by grouping similar items together.