SciPydata~30 mins

K-means via scipy vs scikit-learn - Hands-On Comparison

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

K-means Clustering with SciPy and scikit-learn

📖 Scenario: You work as a data analyst for a small retail company. You want to group customers based on their shopping habits to create better marketing strategies. You will use K-means clustering to find groups of similar customers.

🎯 Goal: Build a simple K-means clustering model using both scipy and scikit-learn libraries. Compare how to set up the data, run the clustering, and get the cluster centers.

📋 What You'll Learn

Create a dataset of customer shopping data as a list of lists.

Set the number of clusters to 2 using a variable.

Use scipy.cluster.vq.kmeans to find cluster centers.

Use sklearn.cluster.KMeans to fit the same data and get cluster centers.

Print the cluster centers from both methods.

💡 Why This Matters

🌍 Real World

K-means clustering helps businesses group customers or products based on features to target marketing or improve services.

💼 Career

Data scientists and analysts often use clustering to find patterns in data without labels, helping in customer segmentation and recommendation systems.

Progress0 / 4 steps

Create the customer data

Create a variable called data that holds this exact list of lists: [[1.0, 2.0], [1.5, 1.8], [5.0, 8.0], [8.0, 8.0], [1.0, 0.6], [9.0, 11.0]].

SciPy

# Create the variable data with the exact list of lists
# Your code here

Need a hint?

Use a variable named data and assign the list exactly as shown.

Set the number of clusters

Create a variable called num_clusters and set it to 2.

SciPy

data = [[1.0, 2.0], [1.5, 1.8], [5.0, 8.0], [8.0, 8.0], [1.0, 0.6], [9.0, 11.0]]
# Create num_clusters and set it to 2
# Your code here

Need a hint?

Use a variable named num_clusters and assign the value 2.

Run K-means with SciPy and scikit-learn

Import kmeans from scipy.cluster.vq and KMeans from sklearn.cluster. Use kmeans with data and num_clusters to get centroids_scipy. Then create a KMeans object with n_clusters=num_clusters, fit it to data, and get centroids_sklearn from its cluster_centers_ attribute.

SciPy

data = [[1.0, 2.0], [1.5, 1.8], [5.0, 8.0], [8.0, 8.0], [1.0, 0.6], [9.0, 11.0]]
num_clusters = 2
# Import kmeans and KMeans
# Run kmeans from scipy to get centroids_scipy
# Run KMeans from sklearn to get centroids_sklearn
# Your code here

Need a hint?

Use the exact variable names and imports as shown. Remember to unpack the result of kmeans into centroids_scipy and a second value you can ignore.

Print the cluster centers

Print the string "SciPy centroids:" followed by centroids_scipy. Then print the string "scikit-learn centroids:" followed by centroids_sklearn.

SciPy

from scipy.cluster.vq import kmeans
from sklearn.cluster import KMeans


data = [[1.0, 2.0], [1.5, 1.8], [5.0, 8.0], [8.0, 8.0], [1.0, 0.6], [9.0, 11.0]]
num_clusters = 2

centroids_scipy, _ = kmeans(data, num_clusters)

kmeans_model = KMeans(n_clusters=num_clusters, random_state=0)
kmeans_model.fit(data)
centroids_sklearn = kmeans_model.cluster_centers_

# Print the centroids from scipy and sklearn
# Your code here

Need a hint?

Use two print statements exactly as described to show the centroids.