ANALYZE for statistics collection in PostgreSQL - Time & Space Complexity
When PostgreSQL runs ANALYZE, it collects statistics about table data to help the database plan queries better.
We want to understand how the time to collect these statistics grows as the table size increases.
Analyze the time complexity of the following code snippet.
ANALYZE my_table;
-- This command scans the table to gather statistics
-- like number of rows, data distribution, and NULL counts.
-- It helps the query planner make better decisions.
This code runs ANALYZE on a table to collect statistics about its data.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning each row in the table to sample data.
- How many times: Once per row in the table (or a sample of rows if sampling is used).
As the number of rows in the table grows, the time to scan and collect statistics grows roughly in proportion.
| Input Size (n rows) | Approx. Operations |
|---|---|
| 10 | About 10 row checks |
| 100 | About 100 row checks |
| 1000 | About 1000 row checks |
Pattern observation: The work grows linearly as the number of rows increases.
Time Complexity: O(n)
This means the time to collect statistics grows directly with the number of rows in the table.
[X] Wrong: "ANALYZE runs instantly no matter how big the table is."
[OK] Correct: ANALYZE must look at many rows to gather data, so bigger tables take more time.
Understanding how ANALYZE scales helps you appreciate how databases keep queries fast by updating statistics efficiently.
"What if ANALYZE used only a fixed small sample of rows regardless of table size? How would the time complexity change?"