0
0
Data Structures Theoryknowledge~15 mins

Disjoint set (Union-Find) in Data Structures Theory - Deep Dive

Choose your learning style9 modes available
Overview - Disjoint set (Union-Find)
What is it?
A disjoint set, also known as Union-Find, is a data structure that keeps track of a collection of separate groups or sets. It supports two main operations: finding which group an element belongs to, and merging two groups into one. This structure helps quickly answer questions about whether two elements are in the same group or not.
Why it matters
Disjoint sets solve the problem of efficiently managing and querying groups that change over time, such as social networks, computer networks, or clustering data. Without this, checking group membership or merging groups would be slow and complicated, making many algorithms inefficient or impractical.
Where it fits
Before learning disjoint sets, you should understand basic data structures like arrays and trees. After mastering disjoint sets, you can explore graph algorithms like Kruskal's minimum spanning tree or network connectivity problems.
Mental Model
Core Idea
Disjoint set keeps track of which elements belong together by linking them in trees and quickly merging these trees when sets combine.
Think of it like...
Imagine a group of friends at a party where each friend holds a colored balloon representing their group. When two friends from different groups meet and decide to join, they tie their balloons together, forming a bigger group. To find out if two friends are in the same group, you just follow the strings of balloons to see if they end up connected.
Sets represented as trees:

  Element: 1   2   3   4   5
  Group:   ┌─┐ ┌─┐ ┌─────┐
           │1│ │2│ │3 4 5│
           └─┘ └─┘ └─────┘

Operations:
Find(3) -> root of 3's tree
Union(2,3) -> link root of 2's tree to root of 3's tree

After Union(2,3):

  1    2
       |
       3
      / \
     4   5
Build-Up - 7 Steps
1
FoundationUnderstanding sets and partitions
🤔
Concept: Introduce the idea of grouping elements into separate, non-overlapping sets.
A set is a collection of distinct elements. When we divide a collection into groups where no element belongs to more than one group, we call this a partition. For example, dividing students into different classrooms forms a partition of the whole school.
Result
You understand that elements can be grouped without overlap, and these groups cover all elements.
Understanding partitions is essential because disjoint sets manage these groups efficiently.
2
FoundationBasic operations: Find and Union
🤔
Concept: Learn the two main operations: finding which set an element belongs to and merging two sets.
Find(element) tells which group the element is in by returning a representative or 'root' of that group. Union(set1, set2) merges two groups into one by linking their representatives. Initially, each element is in its own group.
Result
You can identify groups and combine them, forming larger groups step by step.
These two operations form the core interface of the disjoint set structure.
3
IntermediateRepresenting sets as trees
🤔Before reading on: do you think sets are best represented as lists or trees? Commit to your answer.
Concept: Use trees to represent each set, where each element points to a parent, and the root is the set's representative.
Each element stores a pointer to its parent. The root points to itself. To find the set of an element, follow parent pointers up to the root. This structure allows quick merging by attaching one root to another.
Result
Sets become trees, enabling efficient navigation and merging.
Representing sets as trees allows fast union and find operations by exploiting tree structure.
4
IntermediatePath compression optimization
🤔Before reading on: does updating parent pointers during find help speed up future finds? Yes or no? Commit to your answer.
Concept: Path compression flattens the tree during find operations by making every node on the path point directly to the root.
When you find the root of an element, update all nodes along the path to point directly to the root. This reduces the tree height, speeding up future finds.
Result
Find operations become almost constant time after repeated use.
Path compression dramatically improves performance by keeping trees shallow.
5
IntermediateUnion by rank or size
🤔Before reading on: should you always attach the smaller tree to the larger one during union? Commit to your answer.
Concept: Union by rank or size attaches the smaller tree under the larger tree to keep trees balanced.
Each root stores a rank (approximate tree height) or size (number of elements). When merging, attach the smaller tree's root to the larger tree's root. This prevents trees from becoming tall and slow.
Result
Trees remain balanced, ensuring efficient operations.
Balancing trees during union prevents performance degradation over time.
6
AdvancedAmortized analysis of operations
🤔Before reading on: do you think each find or union operation always takes the same time? Commit to your answer.
Concept: The combination of path compression and union by rank leads to very efficient operations on average, even if some individual operations take longer.
Though some find operations may traverse multiple nodes, the average time per operation over many operations is nearly constant, described by the inverse Ackermann function, which grows extremely slowly.
Result
Disjoint set operations are practically constant time in real use.
Understanding amortized time explains why disjoint sets are so efficient despite occasional longer steps.
7
ExpertApplications in graph algorithms
🤔Before reading on: do you think disjoint sets can help detect cycles in graphs? Commit to your answer.
Concept: Disjoint sets are used in algorithms like Kruskal's to build minimum spanning trees and detect cycles by checking if two vertices belong to the same set.
When adding edges, use find to check if vertices are in the same set. If yes, adding the edge creates a cycle. Otherwise, union their sets. This approach efficiently builds spanning trees and detects connectivity.
Result
Disjoint sets enable fast, scalable graph processing.
Knowing practical applications reveals the power and importance of disjoint sets beyond theory.
Under the Hood
Internally, each element stores a pointer to its parent, forming a forest of trees representing sets. The find operation traverses parent pointers to find the root, which identifies the set. Path compression updates these pointers during find to point directly to the root, flattening the tree. Union merges two trees by linking one root to another, guided by rank or size to keep trees balanced. This structure minimizes the height of trees, ensuring fast operations.
Why designed this way?
Disjoint sets were designed to efficiently manage dynamic connectivity problems where groups merge over time. Early methods used simple lists or arrays but were slow for large data. Representing sets as trees with path compression and union by rank was a breakthrough, balancing simplicity and speed. Alternatives like linked lists or naive arrays were rejected due to poor performance on repeated queries.
Disjoint Set Forest Structure:

  [Element] -> [Parent]

  Example:

  1 -> 1 (root)
  2 -> 1
  3 -> 3 (root)
  4 -> 3

  Find(2): 2 -> 1 (root)
  Union(1,3): attach root 3 to root 1

  After Union:
  3 -> 1
  4 -> 3

  Tree looks like:

    1
   / \
  2   3
       \
        4
Myth Busters - 4 Common Misconceptions
Quick: Does union operation always merge the smaller set into the larger one? Commit yes or no.
Common Belief:Union always attaches the smaller set to the larger set to keep trees balanced.
Tap to reveal reality
Reality:Union by rank attaches the tree with smaller rank (height estimate), not necessarily the smaller set by element count.
Why it matters:Confusing rank with size can lead to incorrect implementations that degrade performance.
Quick: Does path compression change the set membership of elements? Commit yes or no.
Common Belief:Path compression might change which set an element belongs to because it changes parent pointers.
Tap to reveal reality
Reality:Path compression only changes internal pointers to speed up find; it does not change set membership.
Why it matters:Misunderstanding this can cause incorrect assumptions about data integrity during operations.
Quick: Is it necessary to perform union before find to get correct results? Commit yes or no.
Common Belief:You must union sets before calling find to get the correct representative.
Tap to reveal reality
Reality:Find works independently to identify the set of any element; union merges sets but is not required before find.
Why it matters:Believing this can cause confusion about operation order and misuse of the data structure.
Quick: Can disjoint sets be used to store overlapping groups? Commit yes or no.
Common Belief:Disjoint sets can manage overlapping groups where elements belong to multiple sets.
Tap to reveal reality
Reality:Disjoint sets only manage non-overlapping groups; each element belongs to exactly one set.
Why it matters:Using disjoint sets for overlapping groups leads to incorrect results and logic errors.
Expert Zone
1
The inverse Ackermann function governs the amortized time complexity, which is practically constant but theoretically subtle.
2
Path compression can be implemented recursively or iteratively, with subtle differences in performance and stack usage.
3
Union by rank and union by size are similar but not identical; choosing one affects subtle performance trade-offs.
When NOT to use
Disjoint sets are not suitable when elements can belong to multiple overlapping groups; in such cases, other data structures like graphs or hash maps are better. Also, if you need to track detailed group contents or support deletions efficiently, disjoint sets are limited.
Production Patterns
In real-world systems, disjoint sets are used in network connectivity checks, clustering algorithms, image processing for connected components, and dynamic equivalence problems. They are often combined with other data structures for complex queries and optimized with low-level memory management for speed.
Connections
Graph Theory
Disjoint sets are used to manage connected components and detect cycles in graphs.
Understanding disjoint sets clarifies how algorithms like Kruskal's efficiently build minimum spanning trees by managing connectivity.
Equivalence Relations (Mathematics)
Disjoint sets represent equivalence classes formed by equivalence relations.
Knowing this connection helps understand the mathematical foundation of grouping elements by shared properties.
Social Networks (Sociology)
Disjoint sets model communities or friend groups that merge over time.
Recognizing this helps apply data structure concepts to real-world social dynamics and clustering.
Common Pitfalls
#1Not using path compression, causing slow find operations.
Wrong approach:def find(x): while parent[x] != x: x = parent[x] return x
Correct approach:def find(x): if parent[x] != x: parent[x] = find(parent[x]) return parent[x]
Root cause:Ignoring path compression misses the opportunity to flatten trees, leading to tall trees and slow queries.
#2Always attaching the second tree to the first during union without considering size or rank.
Wrong approach:def union(x, y): parent[find(y)] = find(x)
Correct approach:def union(x, y): rootX = find(x) rootY = find(y) if rank[rootX] < rank[rootY]: parent[rootX] = rootY else: parent[rootY] = rootX if rank[rootX] == rank[rootY]: rank[rootX] += 1
Root cause:Ignoring tree size or rank can cause unbalanced trees and degrade performance.
#3Assuming disjoint sets support element removal from sets.
Wrong approach:# Trying to remove element parent[element] = None
Correct approach:# Disjoint sets do not support removal; use other data structures if needed
Root cause:Misunderstanding the static nature of disjoint sets leads to incorrect assumptions about element removal.
Key Takeaways
Disjoint sets efficiently manage groups of elements that merge over time using two main operations: find and union.
Representing sets as trees with path compression and union by rank keeps operations fast, almost constant time on average.
Disjoint sets only handle non-overlapping groups where each element belongs to exactly one set.
They are fundamental in graph algorithms, especially for connectivity and cycle detection.
Understanding their internal mechanics and optimizations reveals why they are widely used in computer science and beyond.