0
0
PostgreSQLquery~15 mins

GENERATED columns (stored and virtual) in PostgreSQL - Deep Dive

Choose your learning style9 modes available
Overview - GENERATED columns (stored and virtual)
What is it?
GENERATED columns are special columns in a database table whose values are automatically calculated from other columns. There are two types: stored columns, which save the calculated value physically, and virtual columns, which compute the value on the fly when queried. This helps keep data consistent and reduces manual errors.
Why it matters
Without GENERATED columns, you would have to manually calculate and update related data, which can lead to mistakes and inconsistencies. GENERATED columns automate this process, ensuring data is always accurate and saving time. This is especially important in complex databases where derived data is frequently used.
Where it fits
Before learning GENERATED columns, you should understand basic table structures and how to write SQL queries. After mastering GENERATED columns, you can explore database optimization, indexing strategies, and advanced data integrity techniques.
Mental Model
Core Idea
A GENERATED column is like a calculator inside the table that automatically fills in values based on other columns, either saving the result or computing it when needed.
Think of it like...
Imagine a recipe book where the total calories for each dish are either written down (stored) or calculated every time you look at the recipe (virtual). Both ways give you the calorie count without you doing the math each time.
┌───────────────┐
│   Table Row   │
├───────────────┤
│ Column A      │
│ Column B      │
│ Generated Col │
└─────┬─────────┘
      │
      ▼
  Calculation
      │
      ├─ Stored: value saved in table
      └─ Virtual: value computed on query
Build-Up - 7 Steps
1
FoundationUnderstanding Basic Table Columns
🤔
Concept: Learn what normal columns are and how they store data directly.
In a database table, columns hold data you enter manually or through applications. For example, a 'price' column stores the price of an item, and a 'quantity' column stores how many items you have. These values are saved exactly as entered.
Result
You can insert, update, and retrieve data from these columns directly.
Knowing how regular columns work is essential because GENERATED columns build on this by automating some of the data entry.
2
FoundationIntroduction to Computed Data
🤔
Concept: Some data depends on other data and can be calculated rather than stored manually.
For example, the total cost of an order can be calculated by multiplying 'price' by 'quantity'. Instead of typing this total every time, you can calculate it when needed.
Result
You understand that some data is derived and can be computed from existing columns.
Recognizing derived data helps you see why GENERATED columns are useful—they automate these calculations.
3
IntermediateCreating Stored GENERATED Columns
🤔Before reading on: do you think stored GENERATED columns save space or use more space than normal columns? Commit to your answer.
Concept: Stored GENERATED columns calculate their value once and save it physically in the table.
In PostgreSQL, you can define a stored GENERATED column like this: CREATE TABLE orders ( price numeric, quantity integer, total_cost numeric GENERATED ALWAYS AS (price * quantity) STORED ); The 'total_cost' is calculated when a row is inserted or updated and saved in the table.
Result
The 'total_cost' column holds the calculated value and can be indexed or queried quickly.
Understanding stored GENERATED columns shows how databases can save computed results for faster access, trading storage space for speed.
4
IntermediateUsing Virtual GENERATED Columns
🤔Before reading on: do you think virtual GENERATED columns store data physically or compute on demand? Commit to your answer.
Concept: Virtual GENERATED columns compute their value every time you query them and do not store the result.
PostgreSQL does not natively support virtual GENERATED columns as of now, but conceptually, a virtual column calculates its value on the fly. This saves storage space but may slow down queries if the calculation is complex.
Result
You get up-to-date calculated values without extra storage, but with some performance cost.
Knowing the difference between stored and virtual helps you choose the right approach based on your needs for speed versus storage.
5
IntermediateBenefits of GENERATED Columns
🤔
Concept: GENERATED columns ensure data consistency and reduce manual errors by automating calculations.
When you use GENERATED columns, you don't have to worry about forgetting to update related data. For example, if 'price' or 'quantity' changes, 'total_cost' updates automatically. This keeps your data reliable.
Result
Your database maintains accurate derived data without extra manual work.
Understanding this benefit helps you appreciate why GENERATED columns improve data integrity and developer productivity.
6
AdvancedIndexing and Performance with Stored Columns
🤔Before reading on: do you think you can create indexes on stored GENERATED columns? Commit to your answer.
Concept: Stored GENERATED columns can be indexed like normal columns, improving query speed on calculated data.
Since stored GENERATED columns physically hold data, you can create indexes on them: CREATE INDEX idx_total_cost ON orders(total_cost); This makes searching or sorting by 'total_cost' faster.
Result
Queries filtering or ordering by the generated column run efficiently.
Knowing that stored GENERATED columns support indexing helps you design faster queries on derived data.
7
ExpertLimitations and Workarounds for Virtual Columns
🤔Before reading on: do you think PostgreSQL supports virtual GENERATED columns natively? Commit to your answer.
Concept: PostgreSQL currently lacks native virtual GENERATED columns, but you can simulate them with views or computed expressions.
To mimic virtual columns, you can create a view: CREATE VIEW order_totals AS SELECT price, quantity, price * quantity AS total_cost FROM orders; This calculates 'total_cost' on query without storing it. Alternatively, use expression indexes or triggers for complex cases.
Result
You achieve virtual column behavior despite PostgreSQL's current limitations.
Understanding these workarounds prepares you to handle PostgreSQL's current feature set effectively and plan for future updates.
Under the Hood
Stored GENERATED columns are implemented by the database engine calculating the expression when a row is inserted or updated, then saving the result physically in the table's storage. Virtual columns, if supported, compute the expression each time the column is queried, without storing the result. This means stored columns use more disk space but provide faster reads, while virtual columns save space but may slow down queries.
Why designed this way?
The design balances trade-offs between storage and performance. Stored columns speed up reads at the cost of storage and write overhead, while virtual columns save space but require computation on each read. PostgreSQL prioritized stored columns first for reliability and performance, leaving virtual columns to be simulated or added later.
┌───────────────┐       ┌───────────────┐
│ Insert/Update │──────▶│ Calculate Expr│
└──────┬────────┘       └──────┬────────┘
       │                       │
       │                       ▼
       │                ┌───────────────┐
       │                │ Store Result  │
       │                └──────┬────────┘
       │                       │
       ▼                       ▼
┌───────────────┐       ┌───────────────┐
│ Table Storage │       │ Query Request │
└──────┬────────┘       └──────┬────────┘
       │                       │
       │                       ▼
       │                ┌───────────────┐
       │                │ Return Stored │
       │                │ Value (Stored)│
       │                └───────────────┘
       │
       │
       │                ┌───────────────┐
       │                │ Calculate Expr│
       │                │ On Query (Virtual)
       │                └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do stored GENERATED columns update automatically if you change the base columns? Commit yes or no.
Common Belief:Stored GENERATED columns do not update automatically; you must update them manually.
Tap to reveal reality
Reality:Stored GENERATED columns automatically recalculate and update their values whenever the base columns change.
Why it matters:Believing this causes unnecessary manual updates, risking errors and extra work.
Quick: Do virtual GENERATED columns use disk space to store their values? Commit yes or no.
Common Belief:Virtual GENERATED columns store their calculated values on disk like normal columns.
Tap to reveal reality
Reality:Virtual GENERATED columns do not store values; they compute them on demand during queries.
Why it matters:Misunderstanding this leads to wrong assumptions about storage needs and query performance.
Quick: Can you create indexes on virtual GENERATED columns in PostgreSQL? Commit yes or no.
Common Belief:You can create indexes on virtual GENERATED columns just like stored ones.
Tap to reveal reality
Reality:PostgreSQL does not support virtual GENERATED columns natively, so you cannot index them directly; you must use workarounds like expression indexes.
Why it matters:Expecting direct indexing can cause design mistakes and performance issues.
Quick: Are GENERATED columns always faster than manual calculations in queries? Commit yes or no.
Common Belief:GENERATED columns always improve query speed compared to manual calculations.
Tap to reveal reality
Reality:Stored GENERATED columns improve speed, but virtual or manual calculations may be slower depending on complexity and indexing.
Why it matters:Assuming automatic speed gains can lead to poor performance if the wrong type of GENERATED column is used.
Expert Zone
1
Stored GENERATED columns add overhead on writes because the database must compute and store the value, which can affect insert/update speed.
2
Complex expressions in GENERATED columns can cause unexpected performance hits, so careful testing is needed before production use.
3
PostgreSQL's lack of native virtual GENERATED columns means developers often use views or expression indexes, which have their own trade-offs in maintainability and performance.
When NOT to use
Avoid GENERATED columns when the calculation is very complex or changes frequently in ways that are not easily expressed in SQL. Instead, consider application-level calculations or triggers. Also, if you need virtual columns and your database does not support them natively, use views or computed queries as alternatives.
Production Patterns
In production, stored GENERATED columns are often used for frequently queried derived data that benefits from indexing, like totals or status flags. Virtual columns are simulated with views for flexible reporting. Developers also combine GENERATED columns with triggers and constraints to enforce data integrity and automate workflows.
Connections
Materialized Views
Both store precomputed data to speed up queries, but materialized views are separate objects refreshed on demand.
Understanding GENERATED columns helps grasp how databases optimize performance by precomputing data, a principle shared with materialized views.
Functional Programming
GENERATED columns resemble pure functions that always produce the same output from given inputs without side effects.
Seeing GENERATED columns as database functions clarifies their deterministic nature and why they ensure consistent data.
Spreadsheet Formulas
Like spreadsheet cells with formulas, GENERATED columns automatically update based on other cells' values.
Recognizing this connection helps non-technical learners relate database automation to familiar tools like Excel.
Common Pitfalls
#1Trying to create a virtual GENERATED column in PostgreSQL using unsupported syntax.
Wrong approach:CREATE TABLE sales ( price numeric, quantity integer, total_cost numeric GENERATED ALWAYS AS (price * quantity) VIRTUAL );
Correct approach:CREATE TABLE sales ( price numeric, quantity integer, total_cost numeric GENERATED ALWAYS AS (price * quantity) STORED );
Root cause:PostgreSQL does not support the VIRTUAL keyword; learners confuse it with other databases that do.
#2Manually updating a stored GENERATED column after changing base columns.
Wrong approach:UPDATE orders SET total_cost = price * quantity WHERE id = 1; -- manual update
Correct approach:UPDATE orders SET price = 10, quantity = 2 WHERE id = 1; -- total_cost updates automatically
Root cause:Misunderstanding that stored GENERATED columns auto-update leads to redundant and error-prone manual updates.
#3Creating an index on a non-stored (virtual) generated expression without using expression indexes.
Wrong approach:CREATE INDEX idx_total_cost ON orders(total_cost); -- fails if total_cost is virtual
Correct approach:CREATE INDEX idx_total_cost ON orders((price * quantity)); -- expression index workaround
Root cause:Assuming virtual columns behave like stored columns causes indexing errors.
Key Takeaways
GENERATED columns automate calculations inside database tables, improving data consistency and reducing manual errors.
Stored GENERATED columns save computed values physically, enabling indexing and faster queries at the cost of extra storage and write overhead.
Virtual GENERATED columns compute values on demand, saving storage but potentially slowing queries; PostgreSQL currently lacks native support for them.
Understanding when and how to use GENERATED columns helps design efficient, reliable databases that balance performance and storage.
Workarounds like views and expression indexes allow PostgreSQL users to simulate virtual columns and optimize query performance.