Data Structures Theoryknowledge~6 mins

Disjoint set (Union-Find) in Data Structures Theory - Full Explanation

Choose your learning style9 modes available

Learn Why Deep Visual Practice Challenge Project Recall Time

Introduction

Imagine you have many groups of friends, and you want to quickly find out if two people are in the same group or not. Managing these groups efficiently is tricky when they keep merging. Disjoint set, also called Union-Find, solves this problem by keeping track of which elements belong together and allowing fast merging and checking.

Explanation

Sets and Elements

The disjoint set structure manages multiple groups, called sets, where each element belongs to exactly one set. Initially, every element is in its own set, separate from others. This setup helps in tracking which elements are connected or grouped together.

Each element starts in its own unique set, ensuring no overlaps.

Find Operation

The find operation helps identify which set an element belongs to by returning a representative or leader of that set. This representative acts like a name tag for the whole group. It allows us to quickly check if two elements share the same set by comparing their representatives.

Find returns the set's representative to identify group membership.

Union Operation

Union merges two different sets into one, combining their elements under a single representative. This operation is used when we discover a connection between elements from separate sets. Efficient union keeps the structure balanced to speed up future operations.

Union combines two sets into one, updating their representative.

Path Compression

Path compression is a technique used during the find operation to speed up future searches. It flattens the structure by making elements point directly to the set representative. This reduces the time needed to find the representative in later operations.

Path compression speeds up find by flattening the set structure.

Union by Rank/Size

To keep the sets balanced, union by rank or size attaches the smaller set under the larger one. This prevents the structure from becoming too tall, which would slow down find operations. It helps maintain efficient performance even after many unions.

Union by rank or size keeps the sets balanced for efficiency.

Real World Analogy

Imagine a school where each student starts in their own club. When two clubs decide to join forces, they pick one leader to represent the new combined club. To find out which club a student belongs to, you ask who their leader is. Over time, students remember their leader directly, making it faster to find their club.

Sets and Elements → Each student starting in their own club

Find Operation → Asking a student who their club leader is

Union Operation → Two clubs merging and choosing one leader

Path Compression → Students remembering their leader directly to avoid asking others

Union by Rank/Size → Smaller club joining the bigger club to keep things simple

Diagram

Initial sets:
 1   2   3   4
  |   |   |   |
[1] [2] [3] [4]

After union(1,2):
 1       3   4
 |       |   |
[1]---->[2] [3] [4]

After union(3,4):
 1       3
 |       |
[1]---->[2] [3]---->[4]

After union(2,3):
 1
 |
[1]---->[2]---->[3]---->[4]

Find with path compression flattens:
 1
 |
[1]---->[2]
  |      |
  +---->[3]
         |
        [4]

This diagram shows sets as chains of elements pointing to their representative, illustrating unions and path compression flattening.

Key Facts

Disjoint Set → A data structure that keeps track of elements partitioned into non-overlapping sets.

Find Operation → Returns the representative element of the set containing a given element.

Union Operation → Merges two distinct sets into a single set.

Path Compression → An optimization that flattens the structure during find to speed up future queries.

Union by Rank/Size → An optimization that attaches the smaller set under the larger to keep the tree shallow.

Code Example

Data Structures Theory

class DisjointSet:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # Path compression
        return self.parent[x]

    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            if self.rank[rootX] < self.rank[rootY]:
                self.parent[rootX] = rootY
            elif self.rank[rootX] > self.rank[rootY]:
                self.parent[rootY] = rootX
            else:
                self.parent[rootY] = rootX
                self.rank[rootX] += 1

# Example usage
ds = DisjointSet(5)  # Elements 0 to 4

# Initially, all are separate
print(ds.find(0))  # 0
print(ds.find(1))  # 1

# Union sets containing 0 and 1
 ds.union(0, 1)
print(ds.find(0))  # Same as find(1)
print(ds.find(1))  # Same as find(0)

# Union sets containing 3 and 4
 ds.union(3, 4)
print(ds.find(3))  # Same as find(4)

# Union sets containing 1 and 3 (merging two groups)
 ds.union(1, 3)
print(ds.find(0))  # Same representative for 0,1,3,4
print(ds.find(4))  # Same representative for 0,1,3,4

OutputSuccess

Common Confusions

Thinking that find returns the actual element instead of the set representative.

Thinking that find returns the actual element instead of the set representative. Find returns the leader or representative of the set, which may be different from the element itself.

Believing union always merges sets by simply linking any element.

Believing union always merges sets by simply linking any element. Union carefully merges sets by rank or size to keep the structure efficient, not just by linking any elements.

Assuming path compression changes the set membership.

Assuming path compression changes the set membership. Path compression only changes pointers to speed up find; it does not alter which elements belong to which sets.

Summary

Disjoint set (Union-Find) helps manage groups of elements that merge over time and lets you quickly check if two elements are in the same group.

The find operation returns a group's representative, while union merges two groups efficiently using rank or size to keep the structure balanced.

Path compression speeds up repeated find operations by flattening the structure, making future queries faster.