0
0
PostgreSQLquery~5 mins

Hash partitioning for distribution in PostgreSQL - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Hash partitioning for distribution
O(1) + O(n/k)
Understanding Time Complexity

When we use hash partitioning in a database, we split data into parts based on a hash function. Understanding how this affects the time to find or insert data helps us see how well the system scales.

We want to know how the work grows as we add more data and partitions.

Scenario Under Consideration

Analyze the time complexity of this hash partitioning setup in PostgreSQL.


CREATE TABLE sales (
  id SERIAL PRIMARY KEY,
  customer_id INT,
  amount NUMERIC
) PARTITION BY HASH (customer_id);

CREATE TABLE sales_part_1 PARTITION OF sales FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sales_part_2 PARTITION OF sales FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sales_part_3 PARTITION OF sales FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sales_part_4 PARTITION OF sales FOR VALUES WITH (MODULUS 4, REMAINDER 3);

This code creates a main table partitioned by hashing the customer_id into 4 parts.

Identify Repeating Operations

Look at what repeats when we search or insert data.

  • Primary operation: The hash function calculation on customer_id to find the right partition.
  • How many times: Once per query or insert, then a search inside one partition.
How Execution Grows With Input

As data grows, the hash function still runs once per operation, but each partition holds less data compared to one big table.

Input Size (n)Approx. Operations
10Hash once + search in ~2-3 rows
100Hash once + search in ~25 rows
1000Hash once + search in ~250 rows

Pattern observation: The hash step stays constant, but the search inside a partition grows roughly with the size of that partition, which is smaller than the whole data.

Final Time Complexity

Time Complexity: O(1) + O(n/k)

This means the hash calculation is constant time, and the search inside one partition grows with the size of that partition, which is total data divided by number of partitions.

Common Mistake

[X] Wrong: "Hash partitioning makes every query run in constant time regardless of data size."

[OK] Correct: The hash function is constant time, but searching inside the chosen partition still depends on how much data is in that partition.

Interview Connect

Understanding how hash partitioning affects query time shows you can think about data distribution and scaling. This skill helps you design systems that stay fast as data grows.

Self-Check

"What if we increased the number of partitions k? How would that change the time complexity for searching inside partitions?"