0
0
SciPydata~10 mins

Flat clustering (fcluster) in SciPy - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Flat clustering (fcluster)
Start with hierarchical clustering result
Choose a threshold or criterion
Call fcluster to assign cluster labels
Get flat clusters as output
Use clusters for analysis or visualization
Flat clustering cuts a hierarchical clustering tree at a chosen level to assign cluster labels to data points.
Execution Sample
SciPy
from scipy.cluster.hierarchy import linkage, fcluster
import numpy as np

X = np.array([[1,2],[2,2],[8,7],[8,8]])
Z = linkage(X, 'single')
clusters = fcluster(Z, t=3, criterion='distance')
This code clusters 4 points hierarchically and then cuts the tree at distance 3 to assign flat cluster labels.
Execution Table
StepActionInput/ConditionOutput/Result
1Create data pointsX = [[1,2],[2,2],[8,7],[8,8]]X array created
2Compute linkagemethod='single'Z linkage matrix with merges and distances
3Call fclusterZ linkage, t=3, criterion='distance'Assign cluster labels based on distance threshold
4Output clustersclusters array[1, 1, 2, 2]
5EndAll points assigned clustersFlat clustering complete
💡 All points assigned to clusters based on distance threshold 3
Variable Tracker
VariableStartAfter linkageAfter fclusterFinal
X[[1,2],[2,2],[8,7],[8,8]][[1,2],[2,2],[8,7],[8,8]][[1,2],[2,2],[8,7],[8,8]][[1,2],[2,2],[8,7],[8,8]]
ZNoneLinkage matrix with shape (3,4)SameSame
clustersNoneNone[1,1,2,2][1,1,2,2]
Key Moments - 3 Insights
Why do some points get the same cluster label?
Because their distance in the linkage tree is below the threshold t=3, so fcluster groups them together (see execution_table step 4).
What does the 't' parameter control in fcluster?
It sets the maximum distance to cut the hierarchical tree; points joined below this distance share the same cluster label (execution_table step 3).
Why do we need the linkage matrix Z before calling fcluster?
Z contains the hierarchical clustering info (merges and distances) that fcluster uses to assign flat clusters (execution_table step 2 and 3).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 4, what cluster label is assigned to the point [8,8]?
A3
B1
C2
D4
💡 Hint
Check the clusters array in execution_table step 4; the last point corresponds to label 2.
At which step does the hierarchical linkage matrix get created?
AStep 2
BStep 3
CStep 1
DStep 4
💡 Hint
Look at execution_table step 2 where linkage is computed.
If we change t=1 in fcluster, what happens to the clusters output?
AAll points get the same cluster label
BEach point gets a unique cluster label
CClusters remain the same as with t=3
Dfcluster raises an error
💡 Hint
With t=1, only points joined at distance <=1 are clustered together; since some points are farther apart, this results in more clusters, possibly unique labels per point.
Concept Snapshot
Flat clustering with fcluster:
- Input: linkage matrix from hierarchical clustering
- Parameter t: distance threshold to cut tree
- Output: cluster labels array
- Points joined below t share cluster
- Use for simple cluster assignment from hierarchy
Full Transcript
Flat clustering with fcluster takes a hierarchical clustering result and cuts it at a chosen distance threshold to assign cluster labels. First, data points are clustered hierarchically using linkage. Then, fcluster uses the linkage matrix and a threshold t to assign flat cluster labels. Points connected below the threshold get the same label. This method helps convert a hierarchical tree into simple clusters for analysis or visualization.