CTE materialization behavior in PostgreSQL - Time & Space Complexity
When using Common Table Expressions (CTEs) in PostgreSQL, it's important to understand how their execution time grows as the data size increases.
We want to know how the database handles CTEs internally and how that affects query speed.
Analyze the time complexity of this CTE query:
WITH cte AS (
SELECT * FROM orders WHERE order_date > '2023-01-01'
)
SELECT cte.customer_id, COUNT(*)
FROM cte
GROUP BY cte.customer_id;
This query first selects recent orders in the CTE, then counts orders per customer from that CTE.
Look for repeated work inside the query:
- Primary operation: Scanning the filtered orders table once to build the CTE result.
- How many times: The CTE is fully computed once, then the main query reads from it.
As the number of orders after 2023-01-01 grows, the work to build the CTE grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 rows scanned and grouped |
| 100 | About 100 rows scanned and grouped |
| 1000 | About 1000 rows scanned and grouped |
Pattern observation: The work grows linearly with the number of matching rows.
Time Complexity: O(n)
This means the query time grows roughly in direct proportion to the number of rows processed in the CTE.
[X] Wrong: "The CTE is just a shortcut and doesn't affect performance because it runs once."
[OK] Correct: In PostgreSQL, CTEs are often fully computed and stored before the main query runs, so their size directly impacts total work.
Understanding how CTEs are materialized helps you write efficient queries and explain performance trade-offs clearly in real projects.
What if the CTE was inlined instead of materialized? How would that change the time complexity?