CTE vs subquery performance in PostgreSQL - Performance Comparison
We want to understand how using a Common Table Expression (CTE) compares to a subquery in terms of time taken as data grows.
How does the choice between CTE and subquery affect the work the database does?
Analyze the time complexity of these two queries:
-- Using a CTE
WITH recent_orders AS (
SELECT * FROM orders WHERE order_date > CURRENT_DATE - INTERVAL '30 days'
)
SELECT customer_id, COUNT(*) FROM recent_orders GROUP BY customer_id;
-- Using a subquery
SELECT customer_id, COUNT(*) FROM (
SELECT * FROM orders WHERE order_date > CURRENT_DATE - INTERVAL '30 days'
) AS recent_orders
GROUP BY customer_id;
Both queries find how many orders each customer made in the last 30 days, but one uses a CTE and the other a subquery.
Look for repeated work done by the database:
- Primary operation: Scanning the orders table to filter recent orders.
- How many times: Once per query, but CTE may cause repeated scans if referenced multiple times.
As the number of orders grows, the database must scan more rows to find recent orders.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | Scan 10 rows once |
| 100 | Scan 100 rows once |
| 1000 | Scan 1000 rows once |
Pattern observation: The scan grows linearly with the number of rows in the orders table.
Time Complexity: O(n)
This means the time grows roughly in direct proportion to the number of rows scanned.
[X] Wrong: "CTEs always run faster than subqueries because they look cleaner."
[OK] Correct: CTEs can sometimes cause the database to run the same work multiple times, making them slower than subqueries in some cases.
Understanding how query structure affects performance helps you write efficient database code and explain your choices clearly.
"What if the CTE is referenced multiple times in the query? How would that affect the time complexity compared to a subquery?"