0
0
Data Structures Theoryknowledge~6 mins

Minimum spanning tree (Kruskal's) in Data Structures Theory - Full Explanation

Choose your learning style9 modes available
Introduction
Imagine you want to connect several cities with roads so that every city is reachable from any other city, but you want to spend the least amount of money building these roads. Finding the cheapest way to connect all points without any loops is the problem that minimum spanning trees solve. Kruskal's algorithm is one way to find this cheapest connection.
Explanation
Graph and edges
A graph is a collection of points called vertices connected by lines called edges. Each edge has a cost or weight that represents how expensive it is to use or build that connection. Kruskal's algorithm works by looking at all these edges and deciding which ones to include.
Kruskal's algorithm starts by considering all edges sorted by their cost.
Sorting edges by weight
The first step in Kruskal's algorithm is to sort all edges from the smallest cost to the largest. This way, the algorithm can pick the cheapest connections first to build the minimum spanning tree. Sorting helps ensure the total cost stays as low as possible.
Sorting edges by weight ensures the algorithm always picks the cheapest available connection.
Building the tree without cycles
Kruskal's algorithm adds edges one by one from the sorted list, but only if adding that edge does not create a loop or cycle. This keeps the structure a tree, meaning all points are connected with no closed loops. To check this, the algorithm uses a method to track which points are already connected.
Edges are added only if they connect two separate groups, avoiding cycles.
Union-Find data structure
To efficiently check if adding an edge creates a cycle, Kruskal's algorithm uses a Union-Find structure. This keeps track of which vertices are connected in groups. When an edge connects two different groups, they are merged. If they are already in the same group, adding that edge would create a cycle and is skipped.
Union-Find helps quickly decide if two vertices belong to the same connected group.
Resulting minimum spanning tree
After processing all edges, the selected edges form a tree that connects all vertices with the smallest total cost. This tree is called the minimum spanning tree. It ensures every point is reachable without any unnecessary or expensive connections.
The final set of edges connects all points with the minimum total cost and no cycles.
Real World Analogy

Imagine you are building a network of roads to connect several villages. You want to use the least amount of road material possible, so you start by building the shortest roads first. But you avoid building roads that would create loops, because loops waste materials. You keep adding roads until all villages are connected.

Graph and edges → Villages as points and possible roads as connections between them
Sorting edges by weight → Choosing to build the shortest roads first to save materials
Building the tree without cycles → Avoiding building roads that create loops to prevent waste
Union-Find data structure → Keeping track of which villages are already connected by roads
Resulting minimum spanning tree → A network of roads connecting all villages with minimum total length
Diagram
Diagram
Vertices: A, B, C, D, E

Edges sorted by weight:
  A-B (1)
  B-C (2)
  A-C (3)
  C-D (4)
  D-E (5)

Process:
  Start with empty tree
  ┌─────────────┐
  │ Add A-B (1) │
  └─────────────┘
        ↓
  ┌─────────────┐
  │ Add B-C (2) │
  └─────────────┘
        ↓
  ┌─────────────┐
  │ Skip A-C (3)│ (creates cycle)
  └─────────────┘
        ↓
  ┌─────────────┐
  │ Add C-D (4) │
  └─────────────┘
        ↓
  ┌─────────────┐
  │ Add D-E (5) │
  └─────────────┘

Final MST edges: A-B, B-C, C-D, D-E
This diagram shows the step-by-step selection of edges by Kruskal's algorithm to build the minimum spanning tree without cycles.
Key Facts
Minimum Spanning TreeA tree connecting all vertices with the smallest possible total edge weight and no cycles.
Kruskal's AlgorithmAn algorithm that builds a minimum spanning tree by adding edges in order of increasing weight, skipping those that create cycles.
Union-FindA data structure used to track and merge connected components efficiently to avoid cycles.
CycleA path in a graph that starts and ends at the same vertex without repeating edges.
Edge WeightA value representing the cost or distance associated with an edge in a graph.
Code Example
Data Structures Theory
class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        while self.parent[x] != x:
            self.parent[x] = self.parent[self.parent[x]]
            x = self.parent[x]
        return x

    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX == rootY:
            return False
        if self.rank[rootX] < self.rank[rootY]:
            self.parent[rootX] = rootY
        elif self.rank[rootX] > self.rank[rootY]:
            self.parent[rootY] = rootX
        else:
            self.parent[rootY] = rootX
            self.rank[rootX] += 1
        return True

def kruskal(n, edges):
    uf = UnionFind(n)
    mst = []
    edges.sort(key=lambda x: x[2])  # sort by weight
    for u, v, w in edges:
        if uf.union(u, v):
            mst.append((u, v, w))
    return mst

# Example graph with 5 vertices (0 to 4)
edges = [
    (0, 1, 1),
    (1, 2, 2),
    (0, 2, 3),
    (2, 3, 4),
    (3, 4, 5)
]
result = kruskal(5, edges)
for edge in result:
    print(edge)
OutputSuccess
Common Confusions
Believing Kruskal's algorithm always picks edges connected to a single starting vertex.
Believing Kruskal's algorithm always picks edges connected to a single starting vertex. Kruskal's algorithm considers all edges globally sorted by weight and connects any two separate groups, not just edges from one vertex.
Thinking cycles are allowed in the minimum spanning tree.
Thinking cycles are allowed in the minimum spanning tree. A minimum spanning tree must never contain cycles because it must be a tree connecting all vertices with no loops.
Assuming the algorithm stops after connecting some vertices.
Assuming the algorithm stops after connecting some vertices. Kruskal's algorithm continues until all vertices are connected in one spanning tree.
Summary
Kruskal's algorithm finds the cheapest way to connect all points by adding edges from smallest to largest weight without creating loops.
It uses a Union-Find structure to efficiently check if adding an edge would create a cycle.
The result is a minimum spanning tree that connects all vertices with the lowest total cost.