Challenge - 5 Problems
Dendrogram Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of dendrogram leaf order
What is the order of leaf labels in the dendrogram produced by this code?
SciPy
import numpy as np from scipy.cluster.hierarchy import linkage, dendrogram np.random.seed(0) data = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]]) Z = linkage(data, method='single') D = dendrogram(Z, no_plot=True) leaf_order = D['leaves'] print(leaf_order)
Attempts:
2 left
💡 Hint
Look at how the linkage method 'single' clusters points based on minimum distance.
✗ Incorrect
The 'single' linkage clusters points by minimum distance, resulting in leaf order [2, 0, 1, 5, 3, 4].
❓ data_output
intermediate2:00remaining
Number of clusters from dendrogram cut
Given this linkage matrix, how many clusters remain if we cut the dendrogram at distance 3?
SciPy
import numpy as np from scipy.cluster.hierarchy import linkage data = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]]) Z = linkage(data, method='complete') clusters = (Z[:, 2] > 3).sum() + 1 print(clusters)
Attempts:
2 left
💡 Hint
Count how many merges have distance greater than 3.
✗ Incorrect
Cutting at distance 3 leaves 3 clusters because two merges have distances above 3.
❓ visualization
advanced3:00remaining
Identify dendrogram linkage method from plot shape
Which linkage method produces this dendrogram shape when clustering the same data?
SciPy
import matplotlib.pyplot as plt from scipy.cluster.hierarchy import linkage, dendrogram import numpy as np np.random.seed(1) data = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]]) methods = ['single', 'complete', 'average', 'ward'] fig, axs = plt.subplots(2, 2, figsize=(10, 8)) for ax, method in zip(axs.flatten(), methods): Z = linkage(data, method=method) dendrogram(Z, ax=ax) ax.set_title(method) plt.tight_layout() plt.show()
Attempts:
2 left
💡 Hint
Ward linkage tries to minimize variance within clusters, producing balanced heights.
✗ Incorrect
Ward linkage produces dendrograms with balanced cluster heights because it minimizes variance.
🔧 Debug
advanced2:00remaining
Error in dendrogram plotting code
What error does this code raise when run?
SciPy
from scipy.cluster.hierarchy import dendrogram import matplotlib.pyplot as plt import numpy as np Z = np.array([[0, 1, 0.5, 2], [2, 3, 0.7, 2], [4, 5, 1.2, 4]]) dendrogram(Z) plt.show()
Attempts:
2 left
💡 Hint
Check the type and shape of the linkage matrix input.
✗ Incorrect
The dendrogram function requires a 2D numpy array linkage matrix, not a list.
🚀 Application
expert3:00remaining
Extract cluster labels from dendrogram at specific height
Which code snippet correctly assigns cluster labels to data points by cutting the dendrogram at height 1.5?
SciPy
import numpy as np from scipy.cluster.hierarchy import linkage, fcluster data = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]]) Z = linkage(data, method='average')
Attempts:
2 left
💡 Hint
Use 'distance' criterion to cut dendrogram at a height threshold.
✗ Incorrect
The 'distance' criterion cuts the dendrogram at the given height to assign clusters.