0
0
SciPydata~20 mins

Dendrogram visualization in SciPy - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Dendrogram Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of dendrogram leaf order
What is the order of leaf labels in the dendrogram produced by this code?
SciPy
import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram

np.random.seed(0)
data = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]])
Z = linkage(data, method='single')
D = dendrogram(Z, no_plot=True)
leaf_order = D['leaves']
print(leaf_order)
A[2, 0, 1, 5, 3, 4]
B[0, 1, 2, 3, 4, 5]
C[5, 3, 4, 2, 0, 1]
D[1, 0, 2, 4, 3, 5]
Attempts:
2 left
💡 Hint
Look at how the linkage method 'single' clusters points based on minimum distance.
data_output
intermediate
2:00remaining
Number of clusters from dendrogram cut
Given this linkage matrix, how many clusters remain if we cut the dendrogram at distance 3?
SciPy
import numpy as np
from scipy.cluster.hierarchy import linkage

data = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]])
Z = linkage(data, method='complete')
clusters = (Z[:, 2] > 3).sum() + 1
print(clusters)
A4
B3
C2
D5
Attempts:
2 left
💡 Hint
Count how many merges have distance greater than 3.
visualization
advanced
3:00remaining
Identify dendrogram linkage method from plot shape
Which linkage method produces this dendrogram shape when clustering the same data?
SciPy
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, dendrogram
import numpy as np

np.random.seed(1)
data = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]])

methods = ['single', 'complete', 'average', 'ward']
fig, axs = plt.subplots(2, 2, figsize=(10, 8))
for ax, method in zip(axs.flatten(), methods):
    Z = linkage(data, method=method)
    dendrogram(Z, ax=ax)
    ax.set_title(method)
plt.tight_layout()
plt.show()
AThe dendrogram with irregular cluster heights is 'average' linkage.
BThe dendrogram with the longest horizontal lines at the bottom is 'single' linkage.
CThe dendrogram with the shortest maximum height is 'complete' linkage.
DThe dendrogram with balanced cluster heights is 'ward' linkage.
Attempts:
2 left
💡 Hint
Ward linkage tries to minimize variance within clusters, producing balanced heights.
🔧 Debug
advanced
2:00remaining
Error in dendrogram plotting code
What error does this code raise when run?
SciPy
from scipy.cluster.hierarchy import dendrogram
import matplotlib.pyplot as plt
import numpy as np

Z = np.array([[0, 1, 0.5, 2], [2, 3, 0.7, 2], [4, 5, 1.2, 4]])
dendrogram(Z)
plt.show()
ATypeError: 'list' object is not a valid linkage matrix
BIndexError: list index out of range
CNo error, dendrogram plots successfully
DValueError: Linkage matrix must be a 2D numpy array
Attempts:
2 left
💡 Hint
Check the type and shape of the linkage matrix input.
🚀 Application
expert
3:00remaining
Extract cluster labels from dendrogram at specific height
Which code snippet correctly assigns cluster labels to data points by cutting the dendrogram at height 1.5?
SciPy
import numpy as np
from scipy.cluster.hierarchy import linkage, fcluster

data = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]])
Z = linkage(data, method='average')
A
labels = fcluster(Z, t=1.5, criterion='distance')
print(labels)
B
labels = fcluster(Z, t=1.5, criterion='maxclust')
print(labels)
C
labels = fcluster(Z, t=1.5, criterion='inconsistent')
print(labels)
D
labels = fcluster(Z, t=1.5, criterion='monocrit')
print(labels)
Attempts:
2 left
💡 Hint
Use 'distance' criterion to cut dendrogram at a height threshold.