0
0
SciPydata~20 mins

Why clustering groups similar data in SciPy - Challenge Your Understanding

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Clustering Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Why does clustering group similar data points?

Which of the following best explains why clustering algorithms group similar data points together?

AClustering algorithms group data points based on their proximity in feature space, so points close to each other are grouped together.
BClustering algorithms randomly assign data points to groups without considering their features.
CClustering algorithms group data points by sorting them alphabetically based on their labels.
DClustering algorithms group data points by their order of appearance in the dataset.
Attempts:
2 left
💡 Hint

Think about how distance or similarity between points affects grouping.

Predict Output
intermediate
2:00remaining
Output of clustering labels using scipy

What is the output labels array after running this clustering code?

SciPy
from scipy.cluster.hierarchy import fcluster, linkage
import numpy as np

# Sample data points
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])

# Perform hierarchical clustering
Z = linkage(X, method='single')

# Form flat clusters with max distance 3
labels = fcluster(Z, t=3, criterion='distance')
print(labels)
A[1 1 2 2 3 3]
B[1 2 3 4 5 6]
C[2 2 2 1 1 1]
D[1 1 1 2 2 2]
Attempts:
2 left
💡 Hint

Look at how points close in space are grouped with a distance threshold of 3.

data_output
advanced
2:00remaining
Number of clusters formed with different distance thresholds

Given the same data and linkage matrix, how many clusters are formed when the distance threshold changes?

SciPy
from scipy.cluster.hierarchy import fcluster, linkage
import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
Z = linkage(X, method='single')

clusters_t2 = fcluster(Z, t=2, criterion='distance')
clusters_t5 = fcluster(Z, t=5, criterion='distance')

num_clusters_t2 = len(set(clusters_t2))
num_clusters_t5 = len(set(clusters_t5))
print(num_clusters_t2, num_clusters_t5)
A2 2
B3 2
C3 1
D1 3
Attempts:
2 left
💡 Hint

Smaller distance thresholds create more clusters; larger thresholds merge clusters.

visualization
advanced
1:30remaining
Interpreting a dendrogram for clustering

Which statement correctly describes the dendrogram shown below for hierarchical clustering?

(Imagine a dendrogram with two main branches splitting at a height around 3)

AThe dendrogram shows two main clusters formed when cutting at height 3, grouping similar points together.
BThe dendrogram shows clusters formed by sorting points alphabetically.
CThe dendrogram indicates that no clusters can be formed because all points are too far apart.
DThe dendrogram shows that all points are identical and form one cluster at any height.
Attempts:
2 left
💡 Hint

Look at where the branches join and the height to decide cluster groups.

🔧 Debug
expert
2:00remaining
Identify the error in clustering code

What error will this code raise when run?

SciPy
from scipy.cluster.hierarchy import linkage, fcluster
import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])

Z = linkage(X, method='single')

# Incorrect use of fcluster with invalid criterion
labels = fcluster(Z, t=3, criterion='invalid')
print(labels)
ANo error, prints cluster labels
BIndexError: list index out of range
CValueError: criterion must be one of ['inconsistent', 'distance', 'maxclust']
DTypeError: linkage() missing required positional argument
Attempts:
2 left
💡 Hint

Check the valid options for the 'criterion' parameter in fcluster.