0
0
Data Analysis Pythondata~20 mins

Customer segmentation pattern in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Customer Segmentation Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of KMeans clustering labels
Given the following code that performs KMeans clustering on customer data, what will be the output of print(labels)?
Data Analysis Python
from sklearn.cluster import KMeans
import numpy as np

X = np.array([[5, 200], [6, 220], [7, 210], [20, 800], [22, 850], [21, 830]])
kmeans = KMeans(n_clusters=2, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_
print(labels)
A[1 1 1 0 0 0]
B[0 0 0 1 1 1]
C[0 1 0 1 0 1]
D[1 0 1 0 1 0]
Attempts:
2 left
💡 Hint
Think about how KMeans groups similar points based on their features.
data_output
intermediate
2:00remaining
Number of customers in each segment
After segmenting customers using KMeans with 3 clusters, what is the count of customers in each cluster?
Data Analysis Python
import pandas as pd
from sklearn.cluster import KMeans

customers = pd.DataFrame({
    'Age': [25, 45, 35, 23, 52, 40, 60, 48],
    'Annual_Spend': [500, 1500, 800, 450, 2000, 1200, 2200, 1600]
})
kmeans = KMeans(n_clusters=3, random_state=0)
customers['Segment'] = kmeans.fit_predict(customers)
counts = customers['Segment'].value_counts().sort_index()
print(counts)
A
0    3
1    2
2    3
Name: Segment, dtype: int64
B
0    2
1    3
2    3
Name: Segment, dtype: int64
C
0    3
1    3
2    2
Name: Segment, dtype: int64
D
0    4
1    2
2    2
Name: Segment, dtype: int64
Attempts:
2 left
💡 Hint
Look at how many customers fall into each cluster label after prediction.
visualization
advanced
3:00remaining
Identify the correct scatter plot of customer segments
Which scatter plot correctly shows customer segments after applying KMeans clustering on Age vs Annual Spend?
Data Analysis Python
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cluster import KMeans

customers = pd.DataFrame({
    'Age': [22, 25, 47, 52, 46, 56, 55, 60],
    'Annual_Spend': [400, 500, 1500, 1600, 1400, 1700, 1800, 1900]
})
kmeans = KMeans(n_clusters=2, random_state=1)
customers['Segment'] = kmeans.fit_predict(customers)

plt.figure(figsize=(6,4))
plt.scatter(customers['Age'], customers['Annual_Spend'], c=customers['Segment'], cmap='viridis')
plt.xlabel('Age')
plt.ylabel('Annual Spend')
plt.title('Customer Segments')
plt.show()
AScatter plot showing random color distribution with no clear groups.
BScatter plot with all points in one color, no segmentation visible.
CScatter plot with two distinct groups: younger customers with lower spend and older customers with higher spend.
DScatter plot with three overlapping clusters mixed in colors.
Attempts:
2 left
💡 Hint
KMeans groups customers by similarity, so expect clear clusters.
🧠 Conceptual
advanced
1:30remaining
Understanding silhouette score in customer segmentation
What does a silhouette score close to 1 indicate about the customer segments created by a clustering algorithm?
AThe clustering algorithm failed to converge.
BClusters overlap heavily and customers are poorly matched to clusters.
CThere are too many clusters causing overfitting.
DClusters are well separated and customers are well matched to their own cluster.
Attempts:
2 left
💡 Hint
Silhouette score measures how similar an object is to its own cluster compared to other clusters.
🔧 Debug
expert
2:00remaining
Identify the error in customer segmentation code
What error will this code raise when trying to segment customers using KMeans?
Data Analysis Python
from sklearn.cluster import KMeans
import pandas as pd

customers = pd.DataFrame({
    'Age': [30, 40, 50],
    'Annual_Spend': [1000, 1500, 2000]
})

kmeans = KMeans(n_clusters=4)
kmeans.fit(customers)
labels = kmeans.labels_
print(labels)
AValueError: Number of clusters (4) cannot be greater than number of samples (3).
BAttributeError: 'KMeans' object has no attribute 'labels_'
CTypeError: fit() missing 1 required positional argument
DNo error, prints labels array of length 3.
Attempts:
2 left
💡 Hint
Check if the number of clusters is valid given the data size.