0
0
MlopsHow-ToBeginner · 3 min read

How to Plot Dendrogram in Python with sklearn and scipy

To plot a dendrogram in Python, use scipy.cluster.hierarchy.dendrogram with linkage data from scipy.cluster.hierarchy.linkage. You can generate linkage from your data and then call dendrogram to visualize hierarchical clustering.
📐

Syntax

The main functions to plot a dendrogram are:

  • linkage(data, method='ward'): Computes hierarchical clustering encoded as a linkage matrix.
  • dendrogram(linkage_matrix): Plots the dendrogram from the linkage matrix.

Here, data is your input dataset (2D array), and method defines the linkage criteria (e.g., 'ward', 'single', 'complete').

python
from scipy.cluster.hierarchy import linkage, dendrogram

# Sample data
# data = ... (your 2D dataset here)

# Compute linkage matrix
Z = linkage(data, method='ward')

# Plot dendrogram
dendrogram(Z)
💻

Example

This example shows how to create and plot a dendrogram for a small dataset using scipy. It demonstrates hierarchical clustering and visualization.

python
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, dendrogram
import numpy as np

# Sample data: 5 points in 2D
X = np.array([[1, 2], [3, 4], [5, 6], [8, 8], [1, 0]])

# Compute linkage matrix using Ward's method
Z = linkage(X, method='ward')

# Plot dendrogram
plt.figure(figsize=(8, 4))
dendrogram(Z)
plt.title('Dendrogram Example')
plt.xlabel('Sample index')
plt.ylabel('Distance')
plt.show()
Output
A dendrogram plot window appears showing hierarchical clustering of the 5 points with branches and distances.
⚠️

Common Pitfalls

  • Passing raw data directly to dendrogram instead of a linkage matrix causes errors.
  • Using incompatible linkage methods with your data scale can produce misleading clusters.
  • Not labeling axes or samples makes the dendrogram hard to interpret.

Always compute the linkage matrix first, then pass it to dendrogram. Choose linkage method based on your data and clustering goal.

python
from scipy.cluster.hierarchy import dendrogram
import numpy as np

# Wrong: passing raw data directly
X = np.array([[1, 2], [3, 4]])
# dendrogram(X)  # This will raise an error

# Right: compute linkage first
from scipy.cluster.hierarchy import linkage
Z = linkage(X, method='ward')
dendrogram(Z)
Output
TypeError: 'float' object is not iterable (for wrong usage) Correct usage shows dendrogram plot.
📊

Quick Reference

Key points to remember when plotting dendrograms:

  • Use linkage() to create the linkage matrix from data.
  • Use dendrogram() to plot the dendrogram from the linkage matrix.
  • Common linkage methods: 'ward', 'single', 'complete', 'average'.
  • Label your plot axes for clarity.
  • Use matplotlib.pyplot.show() to display the plot.

Key Takeaways

Always compute a linkage matrix with scipy.cluster.hierarchy.linkage before plotting.
Use scipy.cluster.hierarchy.dendrogram to visualize hierarchical clustering.
Choose the linkage method based on your data and clustering needs.
Label your dendrogram plot axes for better understanding.
Passing raw data directly to dendrogram causes errors; linkage matrix is required.