0
0
PostgreSQLquery~15 mins

CTE materialization behavior in PostgreSQL - Deep Dive

Choose your learning style9 modes available
Overview - CTE materialization behavior
What is it?
CTE materialization behavior refers to how PostgreSQL processes Common Table Expressions (CTEs) during query execution. A CTE is a temporary named result set that you can reference within a larger query. Materialization means the database engine computes and stores the CTE result separately before using it in the main query. This behavior affects performance and query planning.
Why it matters
Understanding CTE materialization is important because it impacts how efficiently queries run. Without knowing this, you might write queries that run slower than expected or use more resources. If CTEs were always inlined (not materialized), some queries could be faster but might lose clarity or correctness. Knowing when and how CTEs are materialized helps you write better, faster database queries.
Where it fits
Before learning about CTE materialization, you should understand basic SQL queries, subqueries, and how PostgreSQL executes queries. After this, you can explore query optimization, indexing strategies, and advanced PostgreSQL features like window functions and parallel query execution.
Mental Model
Core Idea
CTE materialization means PostgreSQL computes and stores the CTE result separately before using it, affecting query speed and resource use.
Think of it like...
Imagine you bake a batch of cookies (CTE) and put them on a tray before serving. You can serve cookies from the tray multiple times without baking again. Materialization is like baking once and storing the cookies, while inlining would be baking fresh cookies each time you need them.
┌───────────────┐       ┌───────────────┐
│   Main Query  │──────▶│ Use CTE Data  │
└──────┬────────┘       └──────┬────────┘
       │                       │
       │                       │
       ▼                       ▼
┌───────────────┐       ┌───────────────┐
│ Compute CTE   │──────▶│ Store Result  │
│ (Materialize) │       │ (Materialized)│
└───────────────┘       └───────────────┘
Build-Up - 7 Steps
1
FoundationWhat is a Common Table Expression
🤔
Concept: Introduce the idea of a CTE as a named temporary result in SQL.
A Common Table Expression (CTE) is a way to write a temporary result set that you can use within a larger SQL query. It starts with WITH keyword, followed by a name and a query. For example: WITH temp AS (SELECT * FROM users WHERE active = true) SELECT * FROM temp WHERE age > 30; This lets you break complex queries into parts.
Result
You get a filtered list of active users older than 30, using the temporary name 'temp'.
Understanding CTEs helps you organize queries into readable parts, making complex SQL easier to write and maintain.
2
FoundationMaterialization concept in databases
🤔
Concept: Explain what materialization means in query processing.
Materialization means the database runs a query and saves its result temporarily before using it in the main query. Instead of running the same subquery multiple times, it runs once and stores the data. This can save time or use more memory depending on the case.
Result
The database has a stored temporary table of results ready to use.
Knowing materialization helps you understand how databases manage intermediate results and why some queries run faster or slower.
3
IntermediatePostgreSQL default CTE materialization
🤔Before reading on: Do you think PostgreSQL always inlines CTEs or always materializes them? Commit to your answer.
Concept: PostgreSQL by default materializes CTEs, meaning it computes and stores them separately.
In PostgreSQL versions before 12, CTEs are always materialized. This means the CTE query runs once, and its results are stored in a temporary structure. The main query then uses this stored data. This guarantees the CTE is evaluated only once, which can affect performance and visibility of data changes.
Result
The query planner treats the CTE as a separate step, which can prevent some optimizations like pushing filters inside the CTE.
Understanding this default behavior explains why some queries with CTEs run slower than expected and why filtering inside the main query doesn't affect the CTE execution.
4
IntermediateInlining CTEs for optimization
🤔Before reading on: Can you guess what happens if PostgreSQL treats CTEs like subqueries and inlines them? Commit to your answer.
Concept: Inlining means the CTE is merged into the main query, allowing more optimization.
Starting with PostgreSQL 12, you can control CTE materialization. Using the keyword MATERIALIZED forces materialization, while NOT MATERIALIZED allows the planner to inline the CTE like a subquery. Inlining lets the planner push filters and joins inside the CTE, often improving performance.
Result
Queries can run faster because the planner can optimize the whole query together.
Knowing how to control materialization lets you write queries that balance clarity and performance.
5
IntermediateWhen materialization affects query results
🤔Before reading on: Do you think materializing a CTE can change the query's output compared to inlining? Commit to your answer.
Concept: Materialization can affect results when the CTE depends on data that changes during query execution or when side effects exist.
Because materialization runs the CTE once and stores results, any changes to underlying tables after that won't affect the CTE results. In contrast, inlining re-evaluates the CTE logic as part of the main query, reflecting current data. This matters for queries with volatile functions or data-modifying statements.
Result
Materialized CTEs provide stable snapshots of data, while inlined CTEs reflect the latest data state.
Understanding this difference helps avoid subtle bugs and ensures query correctness.
6
AdvancedPerformance trade-offs of materialization
🤔Before reading on: Do you think materializing a CTE always improves performance? Commit to your answer.
Concept: Materialization can improve or hurt performance depending on query complexity and data size.
Materializing a CTE means extra work to store results, which can slow down queries with large data sets. However, it can also prevent repeated execution of expensive subqueries. Inlining can reduce overhead but might cause repeated calculations or prevent some optimizations. Choosing between materialization and inlining depends on the specific query and data.
Result
Performance varies; sometimes materialization is faster, sometimes inlining is better.
Knowing these trade-offs helps you tune queries for best performance in real-world scenarios.
7
ExpertInternal planner decisions and surprises
🤔Before reading on: Do you think PostgreSQL always follows your MATERIALIZED or NOT MATERIALIZED hint? Commit to your answer.
Concept: PostgreSQL's planner may override hints based on cost estimates and heuristics.
Even with explicit MATERIALIZED or NOT MATERIALIZED keywords, PostgreSQL's planner can decide to materialize or inline CTEs differently if it believes it will improve performance or correctness. This behavior depends on version, query complexity, and planner settings. Sometimes, this leads to unexpected query plans and performance results.
Result
Query plans may differ from expectations, requiring careful analysis and testing.
Understanding planner autonomy helps experts debug and optimize complex queries effectively.
Under the Hood
When PostgreSQL encounters a CTE, it decides whether to materialize it by executing the CTE query and storing its results in a temporary structure (like a hidden temporary table). The main query then reads from this stored result. If inlining is chosen, the CTE's query is merged into the main query's execution plan, allowing the planner to optimize the entire query as one unit. Materialization isolates the CTE, preventing some optimizations but ensuring stable results.
Why designed this way?
Originally, PostgreSQL materialized CTEs to guarantee predictable behavior and correctness, especially for queries with side effects or volatile functions. This design simplifies reasoning about query execution order. Later, as performance demands grew, the ability to inline CTEs was added to allow more flexible optimization. The tradeoff balances correctness, clarity, and speed.
┌───────────────┐
│   Parse SQL   │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Identify CTEs │
└──────┬────────┘
       │
       ▼
┌───────────────┐       ┌─────────────────────┐
│ Planner checks│──────▶│ Materialize CTE?    │
│ materialize?  │       │  Yes       No       │
└──────┬────────┘       └────────┬────────────┘
       │                         │
       ▼                         ▼
┌───────────────┐       ┌─────────────────────┐
│ Execute CTE   │       │ Inline CTE query     │
│ and store    │       │ into main query plan │
│ results      │       └────────┬────────────┘
└──────┬────────┘                │
       │                         ▼
       ▼                  ┌───────────────┐
┌───────────────┐          │ Execute main  │
│ Execute main  │◀─────────┤ query with    │
│ query reading │          │ inlined CTE   │
│ materialized  │          └───────────────┘
│ results      │
└───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do you think PostgreSQL always inlines CTEs like subqueries? Commit to yes or no.
Common Belief:CTEs are just like subqueries and always get inlined for optimization.
Tap to reveal reality
Reality:By default, PostgreSQL materializes CTEs, treating them as separate temporary results, not inlining them.
Why it matters:Assuming inlining can lead to unexpected slow queries because materialization prevents some optimizations.
Quick: Does adding a filter in the main query always reduce the CTE's work? Commit to yes or no.
Common Belief:Filters outside the CTE reduce the amount of data the CTE processes.
Tap to reveal reality
Reality:If the CTE is materialized, filters in the main query do not affect the CTE execution; the entire CTE runs first.
Why it matters:Misunderstanding this can cause inefficient queries that process more data than necessary.
Quick: Do you think PostgreSQL always follows MATERIALIZED or NOT MATERIALIZED hints exactly? Commit to yes or no.
Common Belief:The planner always obeys explicit materialization hints without exception.
Tap to reveal reality
Reality:The planner can override hints based on cost estimates and internal heuristics.
Why it matters:Relying blindly on hints can cause confusion when query plans differ from expectations.
Quick: Can materializing a CTE change the final query result? Commit to yes or no.
Common Belief:Materialization only affects performance, not the correctness or output of the query.
Tap to reveal reality
Reality:Materialization can change results if the underlying data changes during query execution or if volatile functions are used.
Why it matters:Ignoring this can lead to subtle bugs and inconsistent query outputs.
Expert Zone
1
Materialized CTEs act like optimization fences, preventing the planner from pushing down predicates or joins inside the CTE.
2
NOT MATERIALIZED CTEs can sometimes cause repeated execution of expensive subqueries if referenced multiple times, unlike materialized ones.
3
The planner's decision to override materialization hints depends on complex cost estimates and can vary between PostgreSQL versions.
When NOT to use
Avoid using materialized CTEs for large datasets when performance is critical and the CTE is simple enough to inline. Instead, use subqueries or inline CTEs with NOT MATERIALIZED to allow better optimization.
Production Patterns
In production, developers use MATERIALIZED to ensure stable snapshots of data for complex transformations, while NOT MATERIALIZED is used to improve performance in read-heavy analytical queries. Monitoring query plans and testing with EXPLAIN ANALYZE is common to choose the best approach.
Connections
Query Optimization
CTE materialization behavior directly affects how query optimizers rearrange and execute queries.
Understanding CTE materialization helps grasp why some optimizations succeed or fail, deepening knowledge of query planning.
Functional Programming (Memoization)
Materialization is similar to memoization where results are cached to avoid repeated computation.
Recognizing this connection clarifies why materialization can improve performance by reusing results.
Caching in Web Applications
Both caching and CTE materialization store intermediate results to speed up repeated access.
Knowing caching principles helps understand trade-offs in materialization between speed and freshness of data.
Common Pitfalls
#1Assuming filters in the main query reduce CTE workload when CTE is materialized.
Wrong approach:WITH cte AS (SELECT * FROM orders) SELECT * FROM cte WHERE order_date > '2023-01-01';
Correct approach:WITH cte AS (SELECT * FROM orders WHERE order_date > '2023-01-01') SELECT * FROM cte;
Root cause:Misunderstanding that materialized CTEs run fully before main query filters apply.
#2Forcing NOT MATERIALIZED on a large, expensive CTE expecting better performance.
Wrong approach:WITH cte AS NOT MATERIALIZED (SELECT complex_aggregation FROM big_table) SELECT * FROM cte JOIN other_table ON ...;
Correct approach:WITH cte AS MATERIALIZED (SELECT complex_aggregation FROM big_table) SELECT * FROM cte JOIN other_table ON ...;
Root cause:Ignoring that NOT MATERIALIZED can cause repeated execution of expensive CTEs.
#3Expecting PostgreSQL to always obey MATERIALIZED or NOT MATERIALIZED hints exactly.
Wrong approach:WITH cte AS NOT MATERIALIZED (SELECT * FROM data) SELECT * FROM cte;
Correct approach:Use EXPLAIN to verify plan and adjust query or planner settings if needed; do not assume hints are absolute.
Root cause:Not knowing the planner can override hints based on cost and heuristics.
Key Takeaways
CTE materialization means PostgreSQL computes and stores the CTE result separately before using it in the main query.
By default, PostgreSQL materializes CTEs, which can prevent some optimizations but ensures stable results.
Starting with version 12, you can control materialization with MATERIALIZED and NOT MATERIALIZED keywords to balance performance and correctness.
Materialization affects both query speed and sometimes the final output, especially with volatile data or functions.
Understanding how and when PostgreSQL materializes CTEs helps you write clearer, faster, and more reliable queries.