How to Plot Dendrogram in Python with sklearn and scipy
To plot a dendrogram in Python, use
scipy.cluster.hierarchy.dendrogram with linkage data from scipy.cluster.hierarchy.linkage. You can generate linkage from your data and then call dendrogram to visualize hierarchical clustering.Syntax
The main functions to plot a dendrogram are:
linkage(data, method='ward'): Computes hierarchical clustering encoded as a linkage matrix.dendrogram(linkage_matrix): Plots the dendrogram from the linkage matrix.
Here, data is your input dataset (2D array), and method defines the linkage criteria (e.g., 'ward', 'single', 'complete').
python
from scipy.cluster.hierarchy import linkage, dendrogram # Sample data # data = ... (your 2D dataset here) # Compute linkage matrix Z = linkage(data, method='ward') # Plot dendrogram dendrogram(Z)
Example
This example shows how to create and plot a dendrogram for a small dataset using scipy. It demonstrates hierarchical clustering and visualization.
python
import matplotlib.pyplot as plt from scipy.cluster.hierarchy import linkage, dendrogram import numpy as np # Sample data: 5 points in 2D X = np.array([[1, 2], [3, 4], [5, 6], [8, 8], [1, 0]]) # Compute linkage matrix using Ward's method Z = linkage(X, method='ward') # Plot dendrogram plt.figure(figsize=(8, 4)) dendrogram(Z) plt.title('Dendrogram Example') plt.xlabel('Sample index') plt.ylabel('Distance') plt.show()
Output
A dendrogram plot window appears showing hierarchical clustering of the 5 points with branches and distances.
Common Pitfalls
- Passing raw data directly to
dendrograminstead of a linkage matrix causes errors. - Using incompatible linkage methods with your data scale can produce misleading clusters.
- Not labeling axes or samples makes the dendrogram hard to interpret.
Always compute the linkage matrix first, then pass it to dendrogram. Choose linkage method based on your data and clustering goal.
python
from scipy.cluster.hierarchy import dendrogram import numpy as np # Wrong: passing raw data directly X = np.array([[1, 2], [3, 4]]) # dendrogram(X) # This will raise an error # Right: compute linkage first from scipy.cluster.hierarchy import linkage Z = linkage(X, method='ward') dendrogram(Z)
Output
TypeError: 'float' object is not iterable (for wrong usage)
Correct usage shows dendrogram plot.
Quick Reference
Key points to remember when plotting dendrograms:
- Use
linkage()to create the linkage matrix from data. - Use
dendrogram()to plot the dendrogram from the linkage matrix. - Common linkage methods: 'ward', 'single', 'complete', 'average'.
- Label your plot axes for clarity.
- Use
matplotlib.pyplot.show()to display the plot.
Key Takeaways
Always compute a linkage matrix with scipy.cluster.hierarchy.linkage before plotting.
Use scipy.cluster.hierarchy.dendrogram to visualize hierarchical clustering.
Choose the linkage method based on your data and clustering needs.
Label your dendrogram plot axes for better understanding.
Passing raw data directly to dendrogram causes errors; linkage matrix is required.