0
0
dbtdata~20 mins

Clustering and partitioning in dbt - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Clustering Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
What is the main goal of clustering in machine learning?
Clustering groups data points based on their similarities. What is the primary purpose of clustering?
ATo reduce the number of features in the dataset
BTo predict the output for new data points
CTo group similar data points without using labeled data
DTo split data into training and testing sets
Attempts:
2 left
💡 Hint
Think about whether clustering uses labels or not.
Predict Output
intermediate
2:00remaining
What is the output of this K-means clustering code snippet?
Given the following Python code using scikit-learn, what is the predicted cluster label for the point [1, 2]?
dbt
from sklearn.cluster import KMeans
import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
pred = kmeans.predict([[1, 2]])
print(pred[0])
A1
B0
C2
DError: n_clusters must be less than or equal to number of samples
Attempts:
2 left
💡 Hint
Look at how the data points are grouped and which cluster [1, 2] is closer to.
Model Choice
advanced
1:30remaining
Which clustering algorithm is best for detecting clusters of varying shapes?
You have a dataset with clusters that are not spherical but have irregular shapes. Which algorithm is most suitable?
ADBSCAN clustering
BK-means clustering
CHierarchical clustering with single linkage
DLinear regression
Attempts:
2 left
💡 Hint
Consider algorithms that do not assume cluster shape.
Hyperparameter
advanced
1:30remaining
What effect does increasing the number of clusters (k) have in K-means?
In K-means clustering, what happens if you increase the number of clusters k too much?
AClusters become too general and lose detail
BThe algorithm runs faster
CThe model automatically finds the best k
DClusters become smaller and may overfit the data
Attempts:
2 left
💡 Hint
Think about what happens when you split data into many small groups.
Metrics
expert
2:00remaining
Which metric evaluates clustering quality without true labels?
You want to measure how well your clustering algorithm performed but you do not have true labels. Which metric can you use?
ASilhouette score
BAccuracy
CMean squared error
DConfusion matrix
Attempts:
2 left
💡 Hint
Look for a metric that works without knowing the correct groups.