0
0
SciPydata~5 mins

Flat clustering (fcluster) in SciPy - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Flat clustering (fcluster)
O(n^3)
Understanding Time Complexity

When using flat clustering with scipy's fcluster, we want to know how the time to assign clusters grows as we add more data points.

We ask: How does the clustering step scale with the number of points?

Scenario Under Consideration

Analyze the time complexity of this flat clustering code snippet.


from scipy.cluster.hierarchy import linkage, fcluster

# data: array of n points
Z = linkage(data, method='ward')
clusters = fcluster(Z, t=3, criterion='maxclust')
    

This code creates a hierarchical clustering and then cuts it to form flat clusters.

Identify Repeating Operations

Look at the main repeated steps:

  • Primary operation: The linkage function computes distances and merges clusters repeatedly.
  • How many times: It performs about n-1 merges for n points.
  • The fcluster step just traverses the linkage once to assign cluster labels.
How Execution Grows With Input

As the number of points grows, the linkage step does more work combining pairs.

Input Size (n)Approx. Operations
10~100
100~10,000
1000~1,000,000

Pattern observation: The work grows roughly with the square of the number of points.

Final Time Complexity

Time Complexity: O(n^3)

This means if you double the number of points, the time to cluster roughly increases by a factor of eight.

Common Mistake

[X] Wrong: "fcluster runs in linear time because it just assigns clusters."

[OK] Correct: While fcluster itself is fast, the main cost is in linkage, which grows much faster as data grows.

Interview Connect

Understanding how clustering scales helps you explain choices in data analysis and handle larger datasets confidently.

Self-Check

What if we used a different linkage method that approximates distances? How would the time complexity change?