0
0
SciPydata~10 mins

Dendrogram visualization in SciPy - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Dendrogram visualization
Start with data points
Calculate distances
Perform hierarchical clustering
Create linkage matrix
Plot dendrogram
Visualize cluster merges
End
This flow shows how data points are clustered step-by-step and visualized as a dendrogram.
Execution Sample
SciPy
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt

X = [[1, 2], [3, 4], [5, 6], [7, 8]]
Z = linkage(X, 'single')
dendrogram(Z)
plt.show()
This code clusters 4 points and plots their dendrogram showing merges.
Execution Table
StepActionData/VariablesResult/Output
1Input data pointsX = [[1,2],[3,4],[5,6],[7,8]]Data ready for clustering
2Calculate linkage matrixlinkage(X, 'single')Z = [[0,1,2.8284271247461903,2],[2,3,2.8284271247461903,2],[4,5,2.8284271247461903,4]]
3Plot dendrogramdendrogram(Z)Dendrogram plot shows cluster merges
4Show plotplt.show()Visual dendrogram displayed
5EndProcess complete
💡 All data points clustered and dendrogram displayed
Variable Tracker
VariableStartAfter Step 2After Step 3Final
X[[1,2],[3,4],[5,6],[7,8]][[1,2],[3,4],[5,6],[7,8]][[1,2],[3,4],[5,6],[7,8]][[1,2],[3,4],[5,6],[7,8]]
ZNone[[0,1,2.8284271247461903,2],[2,3,2.8284271247461903,2],[4,5,2.8284271247461903,4]][[0,1,2.8284271247461903,2],[2,3,2.8284271247461903,2],[4,5,2.8284271247461903,4]][[0,1,2.8284271247461903,2],[2,3,2.8284271247461903,2],[4,5,2.8284271247461903,4]]
Key Moments - 3 Insights
Why does the linkage matrix Z have 3 rows for 4 data points?
Because hierarchical clustering merges points step-by-step, for n points there are n-1 merges, so Z has n-1 rows (3 rows for 4 points). See execution_table step 2.
What does each row in the linkage matrix represent?
Each row shows a merge: the two clusters merged, the distance between them, and the number of original points in the new cluster. This is shown in execution_table step 2.
Why do we use 'single' linkage in linkage()?
'Single' linkage means clusters are merged based on the smallest distance between points in clusters. This affects the shape of the dendrogram, as seen in execution_table step 2.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table step 2, what does the number 2.828 represent in the linkage matrix?
AThe index of a data point
BThe number of clusters
CThe distance between two clusters merged
DThe height of the dendrogram
💡 Hint
Check the 'Result/Output' column in step 2 of execution_table where linkage matrix is shown.
At which step is the dendrogram plot created according to the execution_table?
AStep 3
BStep 2
CStep 1
DStep 4
💡 Hint
Look at the 'Action' column for plotting dendrogram in execution_table.
If we add more data points, how will the linkage matrix Z change?
AIt will have fewer rows
BIt will have more rows
CIt will stay the same size
DIt will become empty
💡 Hint
Recall from key_moments that linkage matrix has n-1 rows for n points.
Concept Snapshot
Dendrogram visualization:
- Use scipy.cluster.hierarchy linkage() to cluster data
- linkage() returns a matrix showing cluster merges
- dendrogram() plots this matrix as a tree
- Each merge shows which clusters joined and distance
- Visualizes hierarchical clustering step-by-step
Full Transcript
We start with data points and calculate distances between them. Then we perform hierarchical clustering using linkage() which returns a matrix showing how clusters merge step-by-step. This matrix is passed to dendrogram() to create a tree plot. The dendrogram visually shows the order and distance of merges. The process ends when all points are clustered. This helps us understand cluster relationships visually.