Overview - Recursive CTE for graph traversal

What is it?

A Recursive Common Table Expression (CTE) is a special SQL query that calls itself to explore data that is connected in a chain or network, like a family tree or a map of roads. It helps find all related items starting from one point by repeatedly following links. This is especially useful for graphs, where nodes connect to other nodes in complex ways. Recursive CTEs let you write these searches clearly and efficiently inside the database.

Why it matters

Without recursive CTEs, finding all connected parts in a graph would require many separate queries or complicated code outside the database, making it slow and hard to maintain. Recursive CTEs solve this by letting the database do the heavy lifting in one query, saving time and reducing errors. This makes tasks like finding all friends of a friend, or all parts connected to a machine, much easier and faster.

Where it fits

Before learning recursive CTEs, you should understand basic SQL queries, joins, and simple CTEs (non-recursive). After mastering recursive CTEs, you can explore advanced graph algorithms, window functions, and performance tuning for recursive queries.

Mental Model

Core Idea

A recursive CTE repeatedly expands a set of results by joining new connected rows until no more connections are found.

Think of it like...

Imagine exploring a maze by starting at the entrance and walking down every path you find, marking each new room you enter, until you have visited every reachable room.

Start with initial nodes (anchor) ──► Find connected nodes (recursive step) ──► Repeat until no new nodes

┌───────────────┐
│ Anchor Query  │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Recursive Step│
│ (join edges)  │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Final Result  │
└───────────────┘

Build-Up - 7 Steps

1

FoundationUnderstanding Basic CTEs

Concept: Learn what a Common Table Expression (CTE) is and how it simplifies queries by naming temporary result sets.

A CTE is like a temporary table you define inside a query. It helps break complex queries into smaller parts. For example: WITH temp AS ( SELECT id, name FROM users WHERE active = true ) SELECT * FROM temp WHERE name LIKE 'A%';

Result

The query returns active users whose names start with 'A'.

Understanding CTEs is essential because recursive CTEs build on this idea by calling themselves.

2

FoundationGraph Data in Tables

3

IntermediateWriting a Recursive CTE

4

IntermediateAvoiding Infinite Loops

5

IntermediateCollecting Full Paths

6

AdvancedPerformance Considerations

7

ExpertRecursive CTEs vs Graph Extensions

Under the Hood

Recursive CTEs work by first running the anchor query to get initial rows. Then, the recursive query runs repeatedly, each time joining the previous results to find new connected rows. This loop continues until no new rows are found. Internally, PostgreSQL manages this iteration and combines results using UNION or UNION ALL. The database keeps track of rows already returned to avoid duplicates unless UNION ALL is used. The recursion is implemented as a fixpoint computation, stopping when the result set stabilizes.

Why designed this way?

Recursive CTEs were designed to extend SQL's declarative power to hierarchical and graph data without requiring procedural code. The two-part structure (anchor + recursive) mirrors mathematical recursion and allows expressing complex traversals in a single query. Alternatives like procedural loops or external code were less efficient and harder to maintain. The design balances expressiveness with performance and integrates well with SQL's set-based model.

┌───────────────┐
│ Anchor Query  │
│ (initial rows)│
└──────┬────────┘
       │
       ▼
┌─────────────────────────────┐
│ Recursive Query (joins with  │
│ previous results to find new│
│ rows)                       │
└──────┬──────────────────────┘
       │
       ▼
┌─────────────────────────────┐
│ Combine results with UNION   │
│ Check if new rows added      │
└──────┬──────────────────────┘
       │
       ▼
┌───────────────┐
│ Stop if no new│
│ rows found   │
└───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Do recursive CTEs automatically prevent infinite loops in cyclic graphs? Commit to yes or no.

Common Belief:Recursive CTEs automatically stop when cycles exist, so no special handling is needed.

Tap to reveal reality

Quick: Do you think recursive CTEs always perform well on very large graphs? Commit to yes or no.

Common Belief:Recursive CTEs are efficient and suitable for all graph sizes.

Tap to reveal reality

Quick: Do you think recursive CTEs can only find paths from one node to another? Commit to yes or no.

Common Belief:Recursive CTEs are only useful for finding paths between two specific nodes.

Tap to reveal reality

Quick: Do you think UNION and UNION ALL behave the same in recursive CTEs? Commit to yes or no.

Common Belief:UNION and UNION ALL are interchangeable in recursive CTEs without affecting results.

Tap to reveal reality

Expert Zone

1

Recursive CTEs can be combined with window functions to rank or filter paths dynamically during traversal.

2

The order of rows returned by recursive CTEs is not guaranteed; explicit ORDER BY is needed for consistent results.

3

PostgreSQL's planner may choose different join strategies for recursive queries, affecting performance unpredictably.

When NOT to use

Avoid recursive CTEs for very large or highly connected graphs where performance is critical; instead, use graph database systems like Neo4j or PostgreSQL extensions like pgRouting. For simple hierarchical data, consider using adjacency lists with indexed parent references or materialized path columns.

Production Patterns

In production, recursive CTEs are used for organizational charts, bill of materials, network pathfinding, and permission hierarchies. They are often combined with limits on recursion depth and cycle detection to ensure reliability. Caching results or precomputing paths is common to improve response times.

Connections

Tree Traversal Algorithms

Recursive CTEs implement similar logic to depth-first or breadth-first tree traversal algorithms.

Understanding classic tree traversal helps grasp how recursive CTEs explore graph nodes step-by-step.

Functional Programming Recursion

Recursive CTEs mirror the concept of functions calling themselves to solve problems incrementally.

Knowing recursion in programming clarifies how recursive CTEs build results by repeated self-reference.

Supply Chain Management

Graph traversal via recursive CTEs models dependencies in supply chains, like parts needed to build products.

Seeing recursive CTEs as dependency resolution tools connects database queries to real-world logistics and planning.

Common Pitfalls

#1Infinite recursion due to cycles in graph.

Wrong approach:WITH RECURSIVE search_path AS ( SELECT from_node, to_node FROM edges WHERE from_node = 1 UNION ALL SELECT e.from_node, e.to_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node ) SELECT * FROM search_path;

Correct approach:WITH RECURSIVE search_path AS ( SELECT from_node, to_node, ARRAY[from_node] AS path FROM edges WHERE from_node = 1 UNION ALL SELECT e.from_node, e.to_node, path || e.from_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node WHERE NOT e.from_node = ANY(path) ) SELECT * FROM search_path;

Root cause:Not tracking visited nodes allows the query to revisit the same nodes endlessly.

#2Using UNION ALL without filtering duplicates causing incorrect results.

Wrong approach:WITH RECURSIVE search_path AS ( SELECT from_node, to_node FROM edges WHERE from_node = 1 UNION ALL SELECT e.from_node, e.to_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node ) SELECT * FROM search_path;

Correct approach:WITH RECURSIVE search_path AS ( SELECT from_node, to_node FROM edges WHERE from_node = 1 UNION SELECT e.from_node, e.to_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node ) SELECT * FROM search_path;

Root cause:Using UNION ALL includes duplicates; UNION removes duplicates to keep results correct.

#3Not limiting recursion depth causing long-running queries.

Wrong approach:WITH RECURSIVE search_path AS ( SELECT from_node, to_node FROM edges WHERE from_node = 1 UNION ALL SELECT e.from_node, e.to_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node ) SELECT * FROM search_path;

Correct approach:WITH RECURSIVE search_path(depth, from_node, to_node) AS ( SELECT 1, from_node, to_node FROM edges WHERE from_node = 1 UNION ALL SELECT depth + 1, e.from_node, e.to_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node WHERE depth < 10 ) SELECT * FROM search_path;

Root cause:Without a recursion limit, queries can run too long or exhaust resources.

Key Takeaways

Recursive CTEs let you explore connected data by repeatedly joining rows until no new connections are found.

They require an anchor query to start and a recursive query that references the CTE itself to continue expanding results.

Preventing infinite loops by tracking visited nodes or limiting recursion depth is essential for safe queries.

While powerful, recursive CTEs may not perform well on very large graphs, where specialized tools are better.

Understanding recursive CTEs bridges SQL querying with graph theory and recursive programming concepts.