Dendrogram visualization in SciPy - Time & Space Complexity
When we create a dendrogram using scipy, we want to know how the time it takes grows as we add more data points.
We ask: How does the work needed to draw the dendrogram change with more samples?
Analyze the time complexity of the following code snippet.
from scipy.cluster.hierarchy import linkage, dendrogram
import numpy as np
# Create random data points
X = np.random.rand(50, 2)
# Compute linkage matrix
Z = linkage(X, method='ward')
# Plot dendrogram
dendrogram(Z)
This code generates 50 random points, computes their hierarchical clustering, and then draws the dendrogram.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Computing the linkage matrix involves repeatedly merging clusters and updating distances.
- How many times: This merging happens roughly once for each pair of clusters until all points are joined, about n-1 times for n points.
As the number of points increases, the number of cluster merges grows quickly because each merge considers distances between clusters.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | ~100 |
| 100 | ~10,000 |
| 1000 | ~1,000,000 |
Pattern observation: The operations grow roughly with the square of the number of points, so doubling points makes the work about four times bigger.
Time Complexity: O(n²)
This means the time to create the dendrogram grows roughly with the square of the number of data points.
[X] Wrong: "The dendrogram computation time grows linearly with the number of points because we just add one point at a time."
[OK] Correct: Each step merges clusters and recalculates distances between many pairs, so the work grows faster than just adding points one by one.
Understanding how dendrogram creation scales helps you explain clustering performance clearly and shows you can think about algorithm costs in real tasks.
"What if we used a different linkage method that approximates distances? How would the time complexity change?"