Challenge - 5 Problems
Customer Segmentation Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of KMeans clustering labels
Given the following code that performs KMeans clustering on customer data, what will be the output of
print(labels)?Data Analysis Python
from sklearn.cluster import KMeans import numpy as np X = np.array([[5, 200], [6, 220], [7, 210], [20, 800], [22, 850], [21, 830]]) kmeans = KMeans(n_clusters=2, random_state=42) kmeans.fit(X) labels = kmeans.labels_ print(labels)
Attempts:
2 left
💡 Hint
Think about how KMeans groups similar points based on their features.
✗ Incorrect
The first three points are close in feature space and form one cluster (label 0), the last three points form the other cluster (label 1).
❓ data_output
intermediate2:00remaining
Number of customers in each segment
After segmenting customers using KMeans with 3 clusters, what is the count of customers in each cluster?
Data Analysis Python
import pandas as pd from sklearn.cluster import KMeans customers = pd.DataFrame({ 'Age': [25, 45, 35, 23, 52, 40, 60, 48], 'Annual_Spend': [500, 1500, 800, 450, 2000, 1200, 2200, 1600] }) kmeans = KMeans(n_clusters=3, random_state=0) customers['Segment'] = kmeans.fit_predict(customers) counts = customers['Segment'].value_counts().sort_index() print(counts)
Attempts:
2 left
💡 Hint
Look at how many customers fall into each cluster label after prediction.
✗ Incorrect
The KMeans model assigns 3 customers to cluster 0, 2 to cluster 1, and 3 to cluster 2 based on their features.
❓ visualization
advanced3:00remaining
Identify the correct scatter plot of customer segments
Which scatter plot correctly shows customer segments after applying KMeans clustering on Age vs Annual Spend?
Data Analysis Python
import matplotlib.pyplot as plt import pandas as pd from sklearn.cluster import KMeans customers = pd.DataFrame({ 'Age': [22, 25, 47, 52, 46, 56, 55, 60], 'Annual_Spend': [400, 500, 1500, 1600, 1400, 1700, 1800, 1900] }) kmeans = KMeans(n_clusters=2, random_state=1) customers['Segment'] = kmeans.fit_predict(customers) plt.figure(figsize=(6,4)) plt.scatter(customers['Age'], customers['Annual_Spend'], c=customers['Segment'], cmap='viridis') plt.xlabel('Age') plt.ylabel('Annual Spend') plt.title('Customer Segments') plt.show()
Attempts:
2 left
💡 Hint
KMeans groups customers by similarity, so expect clear clusters.
✗ Incorrect
The plot shows two clear clusters: younger customers spending less and older customers spending more, matching the KMeans output.
🧠 Conceptual
advanced1:30remaining
Understanding silhouette score in customer segmentation
What does a silhouette score close to 1 indicate about the customer segments created by a clustering algorithm?
Attempts:
2 left
💡 Hint
Silhouette score measures how similar an object is to its own cluster compared to other clusters.
✗ Incorrect
A silhouette score near 1 means clusters are distinct and points are close to their own cluster center.
🔧 Debug
expert2:00remaining
Identify the error in customer segmentation code
What error will this code raise when trying to segment customers using KMeans?
Data Analysis Python
from sklearn.cluster import KMeans import pandas as pd customers = pd.DataFrame({ 'Age': [30, 40, 50], 'Annual_Spend': [1000, 1500, 2000] }) kmeans = KMeans(n_clusters=4) kmeans.fit(customers) labels = kmeans.labels_ print(labels)
Attempts:
2 left
💡 Hint
Check if the number of clusters is valid given the data size.
✗ Incorrect
KMeans cannot create more clusters than the number of data points, so it raises a ValueError.