0
0
PostgreSQLquery~15 mins

Recursive CTE for graph traversal in PostgreSQL - Deep Dive

Choose your learning style9 modes available
Overview - Recursive CTE for graph traversal
What is it?
A Recursive Common Table Expression (CTE) is a special SQL query that calls itself to explore data that is connected in a chain or network, like a family tree or a map of roads. It helps find all related items starting from one point by repeatedly following links. This is especially useful for graphs, where nodes connect to other nodes in complex ways. Recursive CTEs let you write these searches clearly and efficiently inside the database.
Why it matters
Without recursive CTEs, finding all connected parts in a graph would require many separate queries or complicated code outside the database, making it slow and hard to maintain. Recursive CTEs solve this by letting the database do the heavy lifting in one query, saving time and reducing errors. This makes tasks like finding all friends of a friend, or all parts connected to a machine, much easier and faster.
Where it fits
Before learning recursive CTEs, you should understand basic SQL queries, joins, and simple CTEs (non-recursive). After mastering recursive CTEs, you can explore advanced graph algorithms, window functions, and performance tuning for recursive queries.
Mental Model
Core Idea
A recursive CTE repeatedly expands a set of results by joining new connected rows until no more connections are found.
Think of it like...
Imagine exploring a maze by starting at the entrance and walking down every path you find, marking each new room you enter, until you have visited every reachable room.
Start with initial nodes (anchor) ──► Find connected nodes (recursive step) ──► Repeat until no new nodes

┌───────────────┐
│ Anchor Query  │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Recursive Step│
│ (join edges)  │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Final Result  │
└───────────────┘
Build-Up - 7 Steps
1
FoundationUnderstanding Basic CTEs
🤔
Concept: Learn what a Common Table Expression (CTE) is and how it simplifies queries by naming temporary result sets.
A CTE is like a temporary table you define inside a query. It helps break complex queries into smaller parts. For example: WITH temp AS ( SELECT id, name FROM users WHERE active = true ) SELECT * FROM temp WHERE name LIKE 'A%';
Result
The query returns active users whose names start with 'A'.
Understanding CTEs is essential because recursive CTEs build on this idea by calling themselves.
2
FoundationGraph Data in Tables
🤔
Concept: Learn how graphs are stored in tables using edges that connect nodes.
Graphs are stored as tables with rows representing connections. For example, an edges table: | from_node | to_node | |-----------|---------| | 1 | 2 | | 2 | 3 | | 3 | 4 | Each row means 'from_node' connects to 'to_node'.
Result
You have a simple graph structure stored in a table.
Knowing how graphs are stored helps you understand how recursive queries follow connections.
3
IntermediateWriting a Recursive CTE
🤔Before reading on: do you think a recursive CTE needs two parts or just one? Commit to your answer.
Concept: Recursive CTEs have two parts: an anchor query and a recursive query that references the CTE itself.
A recursive CTE looks like this: WITH RECURSIVE search_path AS ( -- Anchor member: starting point SELECT from_node, to_node FROM edges WHERE from_node = 1 UNION -- Recursive member: find next connected nodes SELECT e.from_node, e.to_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node ) SELECT * FROM search_path;
Result
This query finds all nodes reachable from node 1 by following edges.
Understanding the two parts and how they combine is key to writing effective recursive queries.
4
IntermediateAvoiding Infinite Loops
🤔Before reading on: do you think recursive CTEs automatically stop when cycles exist? Commit to yes or no.
Concept: Recursive queries can loop forever if the graph has cycles; you must prevent revisiting nodes.
To avoid infinite loops, track visited nodes: WITH RECURSIVE search_path AS ( SELECT from_node, to_node, ARRAY[from_node] AS path FROM edges WHERE from_node = 1 UNION ALL SELECT e.from_node, e.to_node, path || e.from_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node WHERE NOT e.from_node = ANY(path) ) SELECT * FROM search_path;
Result
The query stops expanding paths that revisit nodes, preventing infinite loops.
Knowing how to track visited nodes is crucial for safe graph traversal.
5
IntermediateCollecting Full Paths
🤔
Concept: You can collect the full path taken to reach each node by accumulating nodes in an array.
Modify the recursive CTE to build a path array: WITH RECURSIVE search_path AS ( SELECT from_node, to_node, ARRAY[from_node, to_node] AS path FROM edges WHERE from_node = 1 UNION ALL SELECT e.from_node, e.to_node, sp.path || e.to_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node WHERE NOT e.to_node = ANY(sp.path) ) SELECT * FROM search_path;
Result
Each row shows the path from the start node to the current node.
Tracking full paths helps understand how nodes are connected step-by-step.
6
AdvancedPerformance Considerations
🤔Before reading on: do you think recursive CTEs always perform well on large graphs? Commit to yes or no.
Concept: Recursive CTEs can be slow on large graphs; indexing and limiting recursion depth help performance.
To improve performance: - Index the columns used in joins (e.g., from_node, to_node). - Limit recursion depth with a counter: WITH RECURSIVE search_path(depth, from_node, to_node, path) AS ( SELECT 1, from_node, to_node, ARRAY[from_node, to_node] FROM edges WHERE from_node = 1 UNION ALL SELECT depth + 1, e.from_node, e.to_node, sp.path || e.to_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node WHERE NOT e.to_node = ANY(sp.path) AND depth < 10 ) SELECT * FROM search_path;
Result
The query stops after 10 steps, preventing long or infinite recursion.
Knowing how to control recursion depth and use indexes prevents slow queries and resource exhaustion.
7
ExpertRecursive CTEs vs Graph Extensions
🤔Before reading on: do you think recursive CTEs are always the best way to query graphs in PostgreSQL? Commit to yes or no.
Concept: PostgreSQL has specialized graph extensions like pgRouting or built-in JSON/JSONB that can sometimes outperform recursive CTEs for complex graph tasks.
While recursive CTEs are powerful and standard, for very large or complex graphs, extensions like pgRouting provide optimized algorithms and data structures. Also, graph databases or external tools may be better for certain use cases. Example: pgRouting offers shortest path functions that run faster than recursive CTEs for big road networks.
Result
Experts choose tools based on graph size and complexity, not just recursive CTEs.
Understanding the limits of recursive CTEs helps choose the right tool for production systems.
Under the Hood
Recursive CTEs work by first running the anchor query to get initial rows. Then, the recursive query runs repeatedly, each time joining the previous results to find new connected rows. This loop continues until no new rows are found. Internally, PostgreSQL manages this iteration and combines results using UNION or UNION ALL. The database keeps track of rows already returned to avoid duplicates unless UNION ALL is used. The recursion is implemented as a fixpoint computation, stopping when the result set stabilizes.
Why designed this way?
Recursive CTEs were designed to extend SQL's declarative power to hierarchical and graph data without requiring procedural code. The two-part structure (anchor + recursive) mirrors mathematical recursion and allows expressing complex traversals in a single query. Alternatives like procedural loops or external code were less efficient and harder to maintain. The design balances expressiveness with performance and integrates well with SQL's set-based model.
┌───────────────┐
│ Anchor Query  │
│ (initial rows)│
└──────┬────────┘
       │
       ▼
┌─────────────────────────────┐
│ Recursive Query (joins with  │
│ previous results to find new│
│ rows)                       │
└──────┬──────────────────────┘
       │
       ▼
┌─────────────────────────────┐
│ Combine results with UNION   │
│ Check if new rows added      │
└──────┬──────────────────────┘
       │
       ▼
┌───────────────┐
│ Stop if no new│
│ rows found   │
└───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do recursive CTEs automatically prevent infinite loops in cyclic graphs? Commit to yes or no.
Common Belief:Recursive CTEs automatically stop when cycles exist, so no special handling is needed.
Tap to reveal reality
Reality:Recursive CTEs do not prevent infinite loops by themselves; you must explicitly track visited nodes to avoid revisiting and looping forever.
Why it matters:Without cycle detection, queries can run indefinitely, causing server overload or crashes.
Quick: Do you think recursive CTEs always perform well on very large graphs? Commit to yes or no.
Common Belief:Recursive CTEs are efficient and suitable for all graph sizes.
Tap to reveal reality
Reality:Recursive CTEs can be slow and resource-heavy on large graphs; specialized graph databases or extensions may perform better.
Why it matters:Using recursive CTEs blindly on big data can cause slow queries and poor user experience.
Quick: Do you think recursive CTEs can only find paths from one node to another? Commit to yes or no.
Common Belief:Recursive CTEs are only useful for finding paths between two specific nodes.
Tap to reveal reality
Reality:Recursive CTEs can find all reachable nodes, collect full paths, and perform complex traversals, not just single paths.
Why it matters:Limiting your understanding restricts how you use recursive CTEs for broader graph problems.
Quick: Do you think UNION and UNION ALL behave the same in recursive CTEs? Commit to yes or no.
Common Belief:UNION and UNION ALL are interchangeable in recursive CTEs without affecting results.
Tap to reveal reality
Reality:UNION removes duplicates and can prevent infinite loops, while UNION ALL includes all rows and may cause duplicates or infinite recursion.
Why it matters:Choosing the wrong union type can lead to incorrect results or performance issues.
Expert Zone
1
Recursive CTEs can be combined with window functions to rank or filter paths dynamically during traversal.
2
The order of rows returned by recursive CTEs is not guaranteed; explicit ORDER BY is needed for consistent results.
3
PostgreSQL's planner may choose different join strategies for recursive queries, affecting performance unpredictably.
When NOT to use
Avoid recursive CTEs for very large or highly connected graphs where performance is critical; instead, use graph database systems like Neo4j or PostgreSQL extensions like pgRouting. For simple hierarchical data, consider using adjacency lists with indexed parent references or materialized path columns.
Production Patterns
In production, recursive CTEs are used for organizational charts, bill of materials, network pathfinding, and permission hierarchies. They are often combined with limits on recursion depth and cycle detection to ensure reliability. Caching results or precomputing paths is common to improve response times.
Connections
Tree Traversal Algorithms
Recursive CTEs implement similar logic to depth-first or breadth-first tree traversal algorithms.
Understanding classic tree traversal helps grasp how recursive CTEs explore graph nodes step-by-step.
Functional Programming Recursion
Recursive CTEs mirror the concept of functions calling themselves to solve problems incrementally.
Knowing recursion in programming clarifies how recursive CTEs build results by repeated self-reference.
Supply Chain Management
Graph traversal via recursive CTEs models dependencies in supply chains, like parts needed to build products.
Seeing recursive CTEs as dependency resolution tools connects database queries to real-world logistics and planning.
Common Pitfalls
#1Infinite recursion due to cycles in graph.
Wrong approach:WITH RECURSIVE search_path AS ( SELECT from_node, to_node FROM edges WHERE from_node = 1 UNION ALL SELECT e.from_node, e.to_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node ) SELECT * FROM search_path;
Correct approach:WITH RECURSIVE search_path AS ( SELECT from_node, to_node, ARRAY[from_node] AS path FROM edges WHERE from_node = 1 UNION ALL SELECT e.from_node, e.to_node, path || e.from_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node WHERE NOT e.from_node = ANY(path) ) SELECT * FROM search_path;
Root cause:Not tracking visited nodes allows the query to revisit the same nodes endlessly.
#2Using UNION ALL without filtering duplicates causing incorrect results.
Wrong approach:WITH RECURSIVE search_path AS ( SELECT from_node, to_node FROM edges WHERE from_node = 1 UNION ALL SELECT e.from_node, e.to_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node ) SELECT * FROM search_path;
Correct approach:WITH RECURSIVE search_path AS ( SELECT from_node, to_node FROM edges WHERE from_node = 1 UNION SELECT e.from_node, e.to_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node ) SELECT * FROM search_path;
Root cause:Using UNION ALL includes duplicates; UNION removes duplicates to keep results correct.
#3Not limiting recursion depth causing long-running queries.
Wrong approach:WITH RECURSIVE search_path AS ( SELECT from_node, to_node FROM edges WHERE from_node = 1 UNION ALL SELECT e.from_node, e.to_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node ) SELECT * FROM search_path;
Correct approach:WITH RECURSIVE search_path(depth, from_node, to_node) AS ( SELECT 1, from_node, to_node FROM edges WHERE from_node = 1 UNION ALL SELECT depth + 1, e.from_node, e.to_node FROM edges e JOIN search_path sp ON e.from_node = sp.to_node WHERE depth < 10 ) SELECT * FROM search_path;
Root cause:Without a recursion limit, queries can run too long or exhaust resources.
Key Takeaways
Recursive CTEs let you explore connected data by repeatedly joining rows until no new connections are found.
They require an anchor query to start and a recursive query that references the CTE itself to continue expanding results.
Preventing infinite loops by tracking visited nodes or limiting recursion depth is essential for safe queries.
While powerful, recursive CTEs may not perform well on very large graphs, where specialized tools are better.
Understanding recursive CTEs bridges SQL querying with graph theory and recursive programming concepts.