0
0
SciPydata~5 mins

Hierarchical clustering (linkage) in SciPy - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is hierarchical clustering?
Hierarchical clustering is a method to group similar data points into clusters by building a tree of clusters. It does not require a fixed number of clusters beforehand.
Click to reveal answer
beginner
What does the 'linkage' function in scipy do?
The 'linkage' function in scipy calculates distances between clusters and merges them step-by-step to form a hierarchy, producing a linkage matrix that shows how clusters combine.
Click to reveal answer
intermediate
Name three common linkage methods used in hierarchical clustering.
Three common linkage methods are: 1) Single linkage - distance between closest points, 2) Complete linkage - distance between farthest points, 3) Average linkage - average distance between all points in clusters.
Click to reveal answer
beginner
What is the output of the scipy linkage function?
The output is a linkage matrix, a 2D array where each row shows which clusters were merged, the distance between them, and the number of original points in the new cluster.
Click to reveal answer
intermediate
Why is hierarchical clustering useful compared to k-means?
Hierarchical clustering does not need you to choose the number of clusters in advance and shows the full cluster tree, which helps understand data structure better.
Click to reveal answer
What does the 'linkage' function in scipy.cluster.hierarchy return?
AA linkage matrix showing cluster merges
BA list of cluster labels for each data point
CA distance matrix between all points
DA dendrogram plot
Which linkage method uses the shortest distance between points in clusters?
AComplete linkage
BWard linkage
CSingle linkage
DAverage linkage
In hierarchical clustering, what does the dendrogram represent?
AA plot showing cluster centers
BA tree showing cluster merges and distances
CA heatmap of data values
DA scatter plot of data points
Which of these is NOT a valid linkage method in scipy?
ASingle
BComplete
CMedian
DK-means
What is the main advantage of hierarchical clustering over k-means?
AIt does not require specifying the number of clusters
BIt is faster for large datasets
CIt always produces spherical clusters
DIt uses random initialization
Explain how the linkage matrix represents cluster merges in hierarchical clustering.
Think about what information you need to know how clusters combine step-by-step.
You got /4 concepts.
    Describe the difference between single, complete, and average linkage methods.
    Focus on how distance between clusters is measured.
    You got /3 concepts.