TABLESAMPLE for random sampling in PostgreSQL - Time & Space Complexity
When we use TABLESAMPLE in PostgreSQL, we want to quickly get a random part of a big table.
We ask: How does the time to get this sample grow as the table gets bigger?
Analyze the time complexity of the following code snippet.
SELECT * FROM large_table TABLESAMPLE SYSTEM (10);
This query returns about 10% of rows from large_table using a fast random sampling method.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning data pages to decide which to include in the sample.
- How many times: Once over the table's data pages, not every row individually.
As the table grows, the number of data pages grows, so the scan takes longer.
| Input Size (rows) | Approx. Operations (pages scanned) |
|---|---|
| 10,000 | Small number of pages scanned |
| 100,000 | About 10 times more pages scanned |
| 1,000,000 | About 100 times more pages scanned |
Pattern observation: The work grows roughly in proportion to the table size because it scans pages, not individual rows.
Time Complexity: O(n)
This means the time to get a sample grows roughly in direct proportion to the size of the table.
[X] Wrong: "TABLESAMPLE reads only the sampled rows, so time stays the same no matter table size."
[OK] Correct: The method scans data pages to decide which rows to include, so it still reads a part of the whole table, making time grow with table size.
Understanding how sampling queries scale helps you explain performance trade-offs clearly and shows you know how databases handle big data efficiently.
"What if we used BERNOULLI sampling instead of SYSTEM? How would the time complexity change?"