0
0
PostgreSQLquery~10 mins

ANALYZE for statistics collection in PostgreSQL - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - ANALYZE for statistics collection
Start ANALYZE command
Scan table rows
Collect column statistics
Update statistics metadata
Statistics ready for query planner
END
ANALYZE scans table data to collect statistics, then updates metadata used by the query planner to optimize queries.
Execution Sample
PostgreSQL
ANALYZE employees;
-- Collects statistics on 'employees' table
-- Helps planner choose best query plan
This command scans the 'employees' table and updates statistics for query optimization.
Execution Table
StepActionDetailsResult
1Start ANALYZECommand issued on 'employees' tableBegin scanning rows
2Scan rowsRead all rows or sampleGather data distribution info
3Collect statsCalculate column stats (e.g. null fraction, distinct values)Statistics computed
4Update metadataWrite stats to system catalogsStatistics stored for planner
5FinishANALYZE completesQuery planner uses new stats
💡 All steps complete; statistics updated for query optimization
Variable Tracker
VariableStartAfter Step 2After Step 3After Step 4Final
rows_scanned0All or sample rows readSameSameSame
column_statsNonePartial dataCalculated stats readyStored in metadataStored
metadata_updatedNoNoNoYesYes
Key Moments - 2 Insights
Why does ANALYZE sometimes scan only a sample of rows instead of all?
To save time, ANALYZE often samples rows (see execution_table step 2). Sampling still gives good stats without full scan.
What happens if statistics are outdated?
The query planner may choose inefficient plans because it relies on old stats (execution_table step 5). Running ANALYZE updates stats to improve planning.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, at which step are statistics actually calculated?
AStep 1
BStep 2
CStep 3
DStep 4
💡 Hint
Check the 'Collect stats' action in execution_table step 3
According to variable_tracker, when is metadata_updated set to 'Yes'?
AAfter Step 4
BAfter Step 2
CAfter Step 3
DAt Start
💡 Hint
Look at the 'metadata_updated' row in variable_tracker after Step 4
If ANALYZE scanned no rows, what would happen to 'rows_scanned' in variable_tracker?
AIt would show all rows read
BIt would remain 0
CIt would show partial data
DIt would be null
💡 Hint
Refer to 'rows_scanned' variable in variable_tracker at Start and After Step 2
Concept Snapshot
ANALYZE command scans table data or samples it
Collects statistics like null fraction, distinct values
Updates system metadata with stats
Helps query planner choose efficient plans
Run periodically to keep stats fresh
Full Transcript
The ANALYZE command in PostgreSQL scans a table's data or a sample of it to collect statistics about the columns. These statistics include information such as how many null values exist and how many distinct values are present. After collecting this data, ANALYZE updates the system metadata where the query planner reads these statistics. This helps the planner make better decisions about how to execute queries efficiently. The process starts when the command is issued, then rows are scanned, statistics are calculated, metadata is updated, and finally the command finishes. Sampling is used to save time while still getting useful statistics. If statistics are outdated, the planner may choose poor query plans, so running ANALYZE regularly is important.