0
0
DSA Cprogramming~18 mins

Minimum Spanning Tree Kruskal's Algorithm in DSA C - Deep Dive

Choose your learning style9 modes available
Overview - Minimum Spanning Tree Kruskal's Algorithm
What is it?
Kruskal's algorithm finds the Minimum Spanning Tree (MST) of a weighted undirected graph — a subset of edges that connects all vertices with minimum total weight and no cycles. It works by sorting all edges by weight, then greedily adding edges that don't create cycles using Union-Find (Disjoint Set Union).
Why it matters
Kruskal's demonstrates two powerful concepts together: greedy algorithm design and the Union-Find data structure. The greedy proof (always take the minimum safe edge) is elegant and non-obvious. Union-Find's path compression and union by rank achieve near-O(1) per operation, making Kruskal's O(E log E) — dominated by the sort.
Where it fits
You need comfort with C structs, arrays, qsort, and the concept of a spanning tree before tackling this. After mastering it, Prim's algorithm, Dijkstra's algorithm, and network flow problems follow naturally.
Mental Model
Core Idea
Sort all edges by weight. Add each edge to the MST if it connects two different components (doesn't create a cycle). Stop when all vertices are connected.
Think of it like...
Building a road network between cities at minimum cost. Sort all possible roads by construction cost. Build the cheapest road first if it connects two previously unconnected city groups. Skip a road if both its cities are already reachable from each other (cycle). Stop when all cities are connected.
Graph: 4 vertices, 5 edges
Edges sorted: (1,2,w=1), (3,4,w=2), (1,3,w=3), (2,4,w=4), (2,3,w=5)

Step 1: Add (1,2,1). Components: {1,2},{3},{4}. MST weight=1.
Step 2: Add (3,4,2). Components: {1,2},{3,4}. MST weight=3.
Step 3: Add (1,3,3). Connects {1,2} and {3,4} → {1,2,3,4}. MST weight=6.
Step 4: (2,4,4) — both in same component → SKIP (cycle).
Step 5: (2,3,5) — both in same component → SKIP (cycle).

MST: edges (1,2), (3,4), (1,3). Total weight = 6.
Build-Up - 5 Steps
1
FoundationWhat is a Spanning Tree
🤔
Concept: A spanning tree connects all V vertices with exactly V-1 edges and no cycles. MST is the spanning tree with minimum total edge weight.
A connected graph with V vertices has spanning trees with exactly V-1 edges. Too few edges → disconnected. Too many → cycle exists. MST is the spanning tree minimizing sum of edge weights. For a graph with V vertices and E edges, there can be multiple spanning trees — we want the one with minimum weight. Kruskal's greedily builds this by always picking the minimum-weight edge that doesn't create a cycle.
Result
MST has V-1 edges, connects all vertices, no cycles, minimum total weight.
V-1 edges is exact — one fewer than needed for a cycle (which requires at least V edges in an undirected graph).
2
FoundationUnion-Find — The Cycle Detector
🤔
Concept: Union-Find tracks which vertices are in the same connected component. Adding an edge between two vertices in the same component creates a cycle.
Union-Find data structure: int parent[V], rank[V]. find(x): returns the root of x's component, with path compression. union(x, y): merges x's and y's components by rank. In C: int find(int* parent, int x) { if (parent[x] != x) parent[x] = find(parent, parent[x]); return parent[x]; } void unite(int* parent, int* rank, int x, int y) { int px = find(parent,x), py = find(parent,y); if (px == py) return; if (rank[px] < rank[py]) { int tmp=px; px=py; py=tmp; } parent[py] = px; if (rank[px] == rank[py]) rank[px]++; }
Result
Union-Find with path compression + union by rank runs in near-O(1) per operation.
Path compression flattens the tree during find() — future finds on the same element are O(1). Union by rank keeps trees shallow, bounding find() depth.
3
IntermediateEdge Sorting in C
🤔Before reading on: how do you represent and sort edges by weight in C? Commit your approach.
Concept: Represent each edge as a struct with u, v, weight. Sort with qsort.
struct Edge { int u, v, weight; }. Sort: int cmpEdge(const void* a, const void* b) { return ((struct Edge*)a)->weight - ((struct Edge*)b)->weight; } qsort(edges, E, sizeof(struct Edge), cmpEdge). This gives O(E log E) sort — the bottleneck of Kruskal's. After sorting, iterate edges in order, using Union-Find to check and add each edge.
Result
All E edges sorted by weight in O(E log E).
qsort with a custom comparator is the idiomatic C approach. The comparison function returns negative/zero/positive matching the convention: (a->weight - b->weight).
4
IntermediateMain Kruskal Loop
🤔Before reading on: how do you know when to stop adding edges? Commit your answer.
Concept: Iterate sorted edges. For each: if endpoints in different components, add to MST. Stop when MST has V-1 edges.
int mstEdges = 0, mstWeight = 0. for (int i = 0; i < E && mstEdges < V-1; i++) { int pu = find(parent, edges[i].u); int pv = find(parent, edges[i].v); if (pu != pv) { unite(parent, rank, edges[i].u, edges[i].v); mstWeight += edges[i].weight; mstEdges++; } }. The early termination (mstEdges < V-1) is important — once we have V-1 edges the MST is complete. If the loop finishes without V-1 edges, the graph is disconnected.
Result
MST constructed greedily in O(E log E) total.
The early termination saves significant time for dense graphs where many high-weight edges would otherwise be checked after the MST is complete.
5
AdvancedKruskal vs Prim — When to Choose
🤔Before reading on: for a dense graph (E ≈ V²), which is faster — Kruskal or Prim? Commit your answer.
Concept: Kruskal: O(E log E) — better for sparse graphs. Prim with heap: O((V+E) log V) — better for dense graphs.
Kruskal sorts all E edges: O(E log E). For sparse graphs (E ≈ V), this is O(V log V). For dense graphs (E ≈ V²), this is O(V² log V). Prim with adjacency matrix: O(V²) — better for dense graphs. Prim with binary heap: O((V+E) log V). Rule: sparse graph → Kruskal. Dense graph → Prim with matrix. Both produce a valid MST — the choice is purely performance.
Result
Kruskal wins for sparse, Prim wins for dense graph representations.
In competitive programming, Kruskal is preferred because edge list representation is natural for sparse graphs and the implementation is straightforward.
Under the Hood
Kruskal's correctness follows from the Cut Property: for any cut (partition) of vertices into two sets, the minimum weight edge crossing the cut must be in some MST. By always taking the minimum-weight edge that connects two different components (a crossing edge for that component boundary cut), Kruskal's greedily satisfies the Cut Property at every step. Union-Find's path compression achieves inverse-Ackermann O(α(n)) amortized per operation — practically O(1).
Why designed this way?
The greedy choice works because MST has an optimal substructure: once an edge is determined safe (doesn't create a cycle and connects two components), it belongs to some MST. Sorting ensures we always pick the globally minimum safe edge. Union-Find makes cycle detection nearly free — without it, cycle detection would require O(V) DFS per edge = O(VE) total.
Kruskal trace on: V=4, edges sorted by weight:
Edge (1-2, w=1): find(1)=1, find(2)=2 → different → ADD. Unite. mst=1.
Edge (3-4, w=2): find(3)=3, find(4)=4 → different → ADD. Unite. mst=2.
Edge (1-3, w=3): find(1)=1, find(3)=3 → different → ADD. Unite. mst=3=V-1. DONE.
Edge (2-4, w=4): not reached (early stop).

MST edges: (1,2), (3,4), (1,3). Total weight: 1+2+3=6.
Myth Busters - 2 Common Misconceptions
Quick: if an edge connects two vertices already in the same component, does adding it create a cycle? Commit yes or no.
Common Belief:Adding an edge between same-component vertices is harmless.
Tap to reveal reality
Reality:Adding such an edge always creates a cycle — there's already a path between those vertices within the component. This is exactly what Union-Find's find() detects: same root = same component = skip.
Why it matters:This is the core correctness condition of Kruskal's. Missing this check produces a graph with cycles — not a spanning tree.
Quick: does Kruskal's always produce the unique MST? Commit yes or no.
Common Belief:Kruskal's always produces the unique MST.
Tap to reveal reality
Reality:MST is unique only if all edge weights are distinct. When equal-weight edges exist, different tie-breaking orders can produce different MSTs, all with the same total weight.
Why it matters:Problems that ask to verify a specific MST need all edge weights to be distinct, or need a defined tie-breaking rule.
Expert Zone
1
Path compression modifies the parent array during find() — this is a side effect but doesn't affect correctness, only subsequent find() speeds.
2
Union by rank without path compression gives O(log n) per operation. Path compression alone gives O(log n) amortized. Both together give O(α(n)) — inverse Ackermann, essentially constant.
3
For competitive programming in C, initialize parent[i] = i and rank[i] = 0 for all vertices 0..V-1 before running Kruskal's.
When NOT to use
For dense graphs represented as adjacency matrices, use Prim's O(V²) instead — Kruskal's requires converting the matrix to an edge list first, adding O(V²) overhead. For directed graphs, MST is undefined — use Minimum Cost Arborescence (Edmonds' algorithm) instead.
Production Patterns
Kruskal's MST appears in: network cable layout (minimum wire to connect offices), cluster analysis (single-linkage hierarchical clustering), image segmentation, and VLSI circuit design. Union-Find itself appears in connected components, percolation simulation, and offline LCA computation.
Connections
Union-Find Data Structure
Kruskal's is the primary application of Union-Find — the MST algorithm and the data structure are inseparable in practice.
Union-Find appears independently in connected components, cycle detection, and network connectivity problems. Mastering it through Kruskal's unlocks all these applications.
Prim's Algorithm
Both find MST but use different strategies — Kruskal grows a forest from edges, Prim grows a single tree from a vertex.
For competitive programming, know both: Kruskal for edge-list sparse graphs, Prim with adjacency matrix for dense graphs.
Common Pitfalls
#1Forgetting to initialize Union-Find before Kruskal's loop.
Wrong approach:// parent[] uninitialized — find() returns garbage for (int i = 0; i < E; i++) { if (find(parent, edges[i].u) != find(parent, edges[i].v)) { ... } }
Correct approach:// Always initialize Union-Find first for (int i = 0; i < V; i++) { parent[i] = i; rank[i] = 0; } for (int i = 0; i < E; i++) { ... }
Root cause:Uninitialized parent array causes find() to follow garbage pointers. Always initialize parent[i] = i (each vertex is its own component).
#2Not sorting edges before the Kruskal loop.
Wrong approach:// Missing sort — processes edges in arbitrary order for (int i = 0; i < E; i++) { if (find(parent, edges[i].u) != find(parent, edges[i].v)) { ... } }
Correct approach:qsort(edges, E, sizeof(struct Edge), cmpEdge); // Sort by weight first for (int i = 0; i < E; i++) { ... }
Root cause:Without sorting, the greedy property fails — we might add a heavier edge when a lighter one would also be safe. The result is a spanning tree but not the MINIMUM one.
Key Takeaways
Sort all edges by weight — this is the O(E log E) step that dominates Kruskal's total complexity.
Use Union-Find to detect cycles in O(α(n)) ≈ O(1) per check — find() same root means same component means cycle.
Add edge to MST only when endpoints are in different components (different find() roots).
Stop when MST has V-1 edges — the spanning tree is complete.
Kruskal beats Prim for sparse graphs; Prim with adjacency matrix beats Kruskal for dense graphs.