Overview - Disjoint set (Union-Find)

What is it?

A disjoint set, also known as Union-Find, is a data structure that keeps track of a collection of separate groups or sets. It supports two main operations: finding which group an element belongs to, and merging two groups into one. This structure helps quickly answer questions about whether two elements are in the same group or not.

Why it matters

Disjoint sets solve the problem of efficiently managing and querying groups that change over time, such as social networks, computer networks, or clustering data. Without this, checking group membership or merging groups would be slow and complicated, making many algorithms inefficient or impractical.

Where it fits

Before learning disjoint sets, you should understand basic data structures like arrays and trees. After mastering disjoint sets, you can explore graph algorithms like Kruskal's minimum spanning tree or network connectivity problems.

Mental Model

Core Idea

Disjoint set keeps track of which elements belong together by linking them in trees and quickly merging these trees when sets combine.

Think of it like...

Imagine a group of friends at a party where each friend holds a colored balloon representing their group. When two friends from different groups meet and decide to join, they tie their balloons together, forming a bigger group. To find out if two friends are in the same group, you just follow the strings of balloons to see if they end up connected.

Sets represented as trees:

  Element: 1   2   3   4   5
  Group:   ┌─┐ ┌─┐ ┌─────┐
           │1│ │2│ │3 4 5│
           └─┘ └─┘ └─────┘

Operations:
Find(3) -> root of 3's tree
Union(2,3) -> link root of 2's tree to root of 3's tree

After Union(2,3):

  1    2
       |
       3
      / \
     4   5

Build-Up - 7 Steps

1

FoundationUnderstanding sets and partitions

Concept: Introduce the idea of grouping elements into separate, non-overlapping sets.

A set is a collection of distinct elements. When we divide a collection into groups where no element belongs to more than one group, we call this a partition. For example, dividing students into different classrooms forms a partition of the whole school.

Result

You understand that elements can be grouped without overlap, and these groups cover all elements.

Understanding partitions is essential because disjoint sets manage these groups efficiently.

2

FoundationBasic operations: Find and Union

3

IntermediateRepresenting sets as trees

4

IntermediatePath compression optimization

5

IntermediateUnion by rank or size

6

AdvancedAmortized analysis of operations

7

ExpertApplications in graph algorithms

Under the Hood

Internally, each element stores a pointer to its parent, forming a forest of trees representing sets. The find operation traverses parent pointers to find the root, which identifies the set. Path compression updates these pointers during find to point directly to the root, flattening the tree. Union merges two trees by linking one root to another, guided by rank or size to keep trees balanced. This structure minimizes the height of trees, ensuring fast operations.

Why designed this way?

Disjoint sets were designed to efficiently manage dynamic connectivity problems where groups merge over time. Early methods used simple lists or arrays but were slow for large data. Representing sets as trees with path compression and union by rank was a breakthrough, balancing simplicity and speed. Alternatives like linked lists or naive arrays were rejected due to poor performance on repeated queries.

Disjoint Set Forest Structure:

  [Element] -> [Parent]

  Example:

  1 -> 1 (root)
  2 -> 1
  3 -> 3 (root)
  4 -> 3

  Find(2): 2 -> 1 (root)
  Union(1,3): attach root 3 to root 1

  After Union:
  3 -> 1
  4 -> 3

  Tree looks like:

    1
   / \
  2   3
       \
        4

Myth Busters - 4 Common Misconceptions

Quick: Does union operation always merge the smaller set into the larger one? Commit yes or no.

Common Belief:Union always attaches the smaller set to the larger set to keep trees balanced.

Tap to reveal reality

Quick: Does path compression change the set membership of elements? Commit yes or no.

Common Belief:Path compression might change which set an element belongs to because it changes parent pointers.

Tap to reveal reality

Quick: Is it necessary to perform union before find to get correct results? Commit yes or no.

Common Belief:You must union sets before calling find to get the correct representative.

Tap to reveal reality

Quick: Can disjoint sets be used to store overlapping groups? Commit yes or no.

Common Belief:Disjoint sets can manage overlapping groups where elements belong to multiple sets.

Tap to reveal reality

Expert Zone

1

The inverse Ackermann function governs the amortized time complexity, which is practically constant but theoretically subtle.

2

Path compression can be implemented recursively or iteratively, with subtle differences in performance and stack usage.

3

Union by rank and union by size are similar but not identical; choosing one affects subtle performance trade-offs.

When NOT to use

Disjoint sets are not suitable when elements can belong to multiple overlapping groups; in such cases, other data structures like graphs or hash maps are better. Also, if you need to track detailed group contents or support deletions efficiently, disjoint sets are limited.

Production Patterns

In real-world systems, disjoint sets are used in network connectivity checks, clustering algorithms, image processing for connected components, and dynamic equivalence problems. They are often combined with other data structures for complex queries and optimized with low-level memory management for speed.

Connections

Graph Theory

Disjoint sets are used to manage connected components and detect cycles in graphs.

Understanding disjoint sets clarifies how algorithms like Kruskal's efficiently build minimum spanning trees by managing connectivity.

Equivalence Relations (Mathematics)

Disjoint sets represent equivalence classes formed by equivalence relations.

Knowing this connection helps understand the mathematical foundation of grouping elements by shared properties.

Social Networks (Sociology)

Disjoint sets model communities or friend groups that merge over time.

Recognizing this helps apply data structure concepts to real-world social dynamics and clustering.

Common Pitfalls

#1Not using path compression, causing slow find operations.

Wrong approach:def find(x): while parent[x] != x: x = parent[x] return x

Correct approach:def find(x): if parent[x] != x: parent[x] = find(parent[x]) return parent[x]

Root cause:Ignoring path compression misses the opportunity to flatten trees, leading to tall trees and slow queries.

#2Always attaching the second tree to the first during union without considering size or rank.

Wrong approach:def union(x, y): parent[find(y)] = find(x)

Correct approach:def union(x, y): rootX = find(x) rootY = find(y) if rank[rootX] < rank[rootY]: parent[rootX] = rootY else: parent[rootY] = rootX if rank[rootX] == rank[rootY]: rank[rootX] += 1

Root cause:Ignoring tree size or rank can cause unbalanced trees and degrade performance.

#3Assuming disjoint sets support element removal from sets.

Wrong approach:# Trying to remove element parent[element] = None

Correct approach:# Disjoint sets do not support removal; use other data structures if needed

Root cause:Misunderstanding the static nature of disjoint sets leads to incorrect assumptions about element removal.

Key Takeaways

Disjoint sets efficiently manage groups of elements that merge over time using two main operations: find and union.

Representing sets as trees with path compression and union by rank keeps operations fast, almost constant time on average.

Disjoint sets only handle non-overlapping groups where each element belongs to exactly one set.

They are fundamental in graph algorithms, especially for connectivity and cycle detection.

Understanding their internal mechanics and optimizations reveals why they are widely used in computer science and beyond.