K-means helps group similar data points together. Using scipy or scikit-learn are two ways to do this in Python.
0
0
K-means via scipy vs scikit-learn
Introduction
You want to find groups in customer data to offer personalized deals.
You want to organize photos by similar colors or features.
You want to simplify complex data by grouping similar items.
You want to compare how different tools perform the same task.
You want to learn how clustering works using popular Python libraries.
Syntax
SciPy
from scipy.cluster.vq import kmeans, vq # data = your data array centroids, distortion = kmeans(data, k) cluster_labels, _ = vq(data, centroids)
scipy uses kmeans to find centers and vq to assign points.
scikit-learn uses KMeans class with fit and predict methods.
Examples
Using scipy to find 2 clusters and assign labels.
SciPy
from scipy.cluster.vq import kmeans, vq import numpy as np data = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]) centroids, distortion = kmeans(data, 2) labels, _ = vq(data, centroids) print('Centroids:', centroids) print('Labels:', labels)
Using scikit-learn to do the same clustering with simpler code.
SciPy
from sklearn.cluster import KMeans import numpy as np data = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]) kmeans = KMeans(n_clusters=2, random_state=0).fit(data) print('Centroids:', kmeans.cluster_centers_) print('Labels:', kmeans.labels_)
Sample Program
This program shows how to run K-means clustering on the same data using both scipy and scikit-learn. It prints the cluster centers and labels for each method so you can compare.
SciPy
from scipy.cluster.vq import kmeans, vq from sklearn.cluster import KMeans import numpy as np # Sample data: points in 2D space data = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]) # Using scipy centroids_scipy, distortion = kmeans(data, 2) labels_scipy, _ = vq(data, centroids_scipy) print('Scipy K-means results:') print('Centroids:', centroids_scipy) print('Labels:', labels_scipy) # Using scikit-learn kmeans_sklearn = KMeans(n_clusters=2, random_state=42).fit(data) print('\nScikit-learn K-means results:') print('Centroids:', kmeans_sklearn.cluster_centers_) print('Labels:', kmeans_sklearn.labels_)
OutputSuccess
Important Notes
Scipy's kmeans returns centroids and distortion (how good the clusters are).
Scikit-learn's KMeans class is easier to use and has more options like initialization methods.
Both methods give similar results on simple data but scikit-learn is preferred for real projects.
Summary
K-means groups data points into clusters based on similarity.
Scipy requires two steps: find centroids, then assign labels.
Scikit-learn combines these steps and offers more features.