Overview - Minimum Spanning Tree Kruskal's Algorithm

What is it?

Kruskal's algorithm finds the Minimum Spanning Tree (MST) of a weighted undirected graph — a subset of edges that connects all vertices with minimum total weight and no cycles. It works by sorting all edges by weight, then greedily adding edges that don't create cycles using Union-Find (Disjoint Set Union).

Why it matters

Kruskal's demonstrates two powerful concepts together: greedy algorithm design and the Union-Find data structure. The greedy proof (always take the minimum safe edge) is elegant and non-obvious. Union-Find's path compression and union by rank achieve near-O(1) per operation, making Kruskal's O(E log E) — dominated by the sort.

Where it fits

You need comfort with C structs, arrays, qsort, and the concept of a spanning tree before tackling this. After mastering it, Prim's algorithm, Dijkstra's algorithm, and network flow problems follow naturally.

Mental Model

Core Idea

Sort all edges by weight. Add each edge to the MST if it connects two different components (doesn't create a cycle). Stop when all vertices are connected.

Think of it like...

Building a road network between cities at minimum cost. Sort all possible roads by construction cost. Build the cheapest road first if it connects two previously unconnected city groups. Skip a road if both its cities are already reachable from each other (cycle). Stop when all cities are connected.

Graph: 4 vertices, 5 edges
Edges sorted: (1,2,w=1), (3,4,w=2), (1,3,w=3), (2,4,w=4), (2,3,w=5)

Step 1: Add (1,2,1). Components: {1,2},{3},{4}. MST weight=1.
Step 2: Add (3,4,2). Components: {1,2},{3,4}. MST weight=3.
Step 3: Add (1,3,3). Connects {1,2} and {3,4} → {1,2,3,4}. MST weight=6.
Step 4: (2,4,4) — both in same component → SKIP (cycle).
Step 5: (2,3,5) — both in same component → SKIP (cycle).

MST: edges (1,2), (3,4), (1,3). Total weight = 6.

Build-Up - 5 Steps

1

FoundationWhat is a Spanning Tree

Concept: A spanning tree connects all V vertices with exactly V-1 edges and no cycles. MST is the spanning tree with minimum total edge weight.

A connected graph with V vertices has spanning trees with exactly V-1 edges. Too few edges → disconnected. Too many → cycle exists. MST is the spanning tree minimizing sum of edge weights. For a graph with V vertices and E edges, there can be multiple spanning trees — we want the one with minimum weight. Kruskal's greedily builds this by always picking the minimum-weight edge that doesn't create a cycle.

Result

MST has V-1 edges, connects all vertices, no cycles, minimum total weight.

V-1 edges is exact — one fewer than needed for a cycle (which requires at least V edges in an undirected graph).

2

FoundationUnion-Find — The Cycle Detector

3

IntermediateEdge Sorting in C

4

IntermediateMain Kruskal Loop

5

AdvancedKruskal vs Prim — When to Choose

Under the Hood

Kruskal's correctness follows from the Cut Property: for any cut (partition) of vertices into two sets, the minimum weight edge crossing the cut must be in some MST. By always taking the minimum-weight edge that connects two different components (a crossing edge for that component boundary cut), Kruskal's greedily satisfies the Cut Property at every step. Union-Find's path compression achieves inverse-Ackermann O(α(n)) amortized per operation — practically O(1).

Why designed this way?

The greedy choice works because MST has an optimal substructure: once an edge is determined safe (doesn't create a cycle and connects two components), it belongs to some MST. Sorting ensures we always pick the globally minimum safe edge. Union-Find makes cycle detection nearly free — without it, cycle detection would require O(V) DFS per edge = O(VE) total.

Kruskal trace on: V=4, edges sorted by weight:
Edge (1-2, w=1): find(1)=1, find(2)=2 → different → ADD. Unite. mst=1.
Edge (3-4, w=2): find(3)=3, find(4)=4 → different → ADD. Unite. mst=2.
Edge (1-3, w=3): find(1)=1, find(3)=3 → different → ADD. Unite. mst=3=V-1. DONE.
Edge (2-4, w=4): not reached (early stop).

MST edges: (1,2), (3,4), (1,3). Total weight: 1+2+3=6.

Myth Busters - 2 Common Misconceptions

Quick: if an edge connects two vertices already in the same component, does adding it create a cycle? Commit yes or no.

Common Belief:Adding an edge between same-component vertices is harmless.

Tap to reveal reality

Quick: does Kruskal's always produce the unique MST? Commit yes or no.

Common Belief:Kruskal's always produces the unique MST.

Tap to reveal reality

Expert Zone

1

Path compression modifies the parent array during find() — this is a side effect but doesn't affect correctness, only subsequent find() speeds.

2

Union by rank without path compression gives O(log n) per operation. Path compression alone gives O(log n) amortized. Both together give O(α(n)) — inverse Ackermann, essentially constant.

3

For competitive programming in C, initialize parent[i] = i and rank[i] = 0 for all vertices 0..V-1 before running Kruskal's.

When NOT to use

For dense graphs represented as adjacency matrices, use Prim's O(V²) instead — Kruskal's requires converting the matrix to an edge list first, adding O(V²) overhead. For directed graphs, MST is undefined — use Minimum Cost Arborescence (Edmonds' algorithm) instead.

Production Patterns

Kruskal's MST appears in: network cable layout (minimum wire to connect offices), cluster analysis (single-linkage hierarchical clustering), image segmentation, and VLSI circuit design. Union-Find itself appears in connected components, percolation simulation, and offline LCA computation.

Connections

Union-Find Data Structure

Kruskal's is the primary application of Union-Find — the MST algorithm and the data structure are inseparable in practice.

Union-Find appears independently in connected components, cycle detection, and network connectivity problems. Mastering it through Kruskal's unlocks all these applications.

Prim's Algorithm

Both find MST but use different strategies — Kruskal grows a forest from edges, Prim grows a single tree from a vertex.

For competitive programming, know both: Kruskal for edge-list sparse graphs, Prim with adjacency matrix for dense graphs.

Common Pitfalls

#1Forgetting to initialize Union-Find before Kruskal's loop.

Wrong approach:// parent[] uninitialized — find() returns garbage for (int i = 0; i < E; i++) { if (find(parent, edges[i].u) != find(parent, edges[i].v)) { ... } }

Correct approach:// Always initialize Union-Find first for (int i = 0; i < V; i++) { parent[i] = i; rank[i] = 0; } for (int i = 0; i < E; i++) { ... }

Root cause:Uninitialized parent array causes find() to follow garbage pointers. Always initialize parent[i] = i (each vertex is its own component).

#2Not sorting edges before the Kruskal loop.

Wrong approach:// Missing sort — processes edges in arbitrary order for (int i = 0; i < E; i++) { if (find(parent, edges[i].u) != find(parent, edges[i].v)) { ... } }

Correct approach:qsort(edges, E, sizeof(struct Edge), cmpEdge); // Sort by weight first for (int i = 0; i < E; i++) { ... }

Root cause:Without sorting, the greedy property fails — we might add a heavier edge when a lighter one would also be safe. The result is a spanning tree but not the MINIMUM one.

Key Takeaways

Sort all edges by weight — this is the O(E log E) step that dominates Kruskal's total complexity.

Use Union-Find to detect cycles in O(α(n)) ≈ O(1) per check — find() same root means same component means cycle.

Add edge to MST only when endpoints are in different components (different find() roots).

Stop when MST has V-1 edges — the spanning tree is complete.

Kruskal beats Prim for sparse graphs; Prim with adjacency matrix beats Kruskal for dense graphs.