0
0
PostgreSQLquery~5 mins

TABLESAMPLE for random sampling in PostgreSQL - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: TABLESAMPLE for random sampling
O(n)
Understanding Time Complexity

When we use TABLESAMPLE in PostgreSQL, we want to quickly get a random part of a big table.

We ask: How does the time to get this sample grow as the table gets bigger?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


SELECT * FROM large_table TABLESAMPLE SYSTEM (10);
    

This query returns about 10% of rows from large_table using a fast random sampling method.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Scanning data pages to decide which to include in the sample.
  • How many times: Once over the table's data pages, not every row individually.
How Execution Grows With Input

As the table grows, the number of data pages grows, so the scan takes longer.

Input Size (rows)Approx. Operations (pages scanned)
10,000Small number of pages scanned
100,000About 10 times more pages scanned
1,000,000About 100 times more pages scanned

Pattern observation: The work grows roughly in proportion to the table size because it scans pages, not individual rows.

Final Time Complexity

Time Complexity: O(n)

This means the time to get a sample grows roughly in direct proportion to the size of the table.

Common Mistake

[X] Wrong: "TABLESAMPLE reads only the sampled rows, so time stays the same no matter table size."

[OK] Correct: The method scans data pages to decide which rows to include, so it still reads a part of the whole table, making time grow with table size.

Interview Connect

Understanding how sampling queries scale helps you explain performance trade-offs clearly and shows you know how databases handle big data efficiently.

Self-Check

"What if we used BERNOULLI sampling instead of SYSTEM? How would the time complexity change?"