Hash partitioning for distribution in PostgreSQL - Time & Space Complexity
When we use hash partitioning in a database, we split data into parts based on a hash function. Understanding how this affects the time to find or insert data helps us see how well the system scales.
We want to know how the work grows as we add more data and partitions.
Analyze the time complexity of this hash partitioning setup in PostgreSQL.
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
customer_id INT,
amount NUMERIC
) PARTITION BY HASH (customer_id);
CREATE TABLE sales_part_1 PARTITION OF sales FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sales_part_2 PARTITION OF sales FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sales_part_3 PARTITION OF sales FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sales_part_4 PARTITION OF sales FOR VALUES WITH (MODULUS 4, REMAINDER 3);
This code creates a main table partitioned by hashing the customer_id into 4 parts.
Look at what repeats when we search or insert data.
- Primary operation: The hash function calculation on customer_id to find the right partition.
- How many times: Once per query or insert, then a search inside one partition.
As data grows, the hash function still runs once per operation, but each partition holds less data compared to one big table.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | Hash once + search in ~2-3 rows |
| 100 | Hash once + search in ~25 rows |
| 1000 | Hash once + search in ~250 rows |
Pattern observation: The hash step stays constant, but the search inside a partition grows roughly with the size of that partition, which is smaller than the whole data.
Time Complexity: O(1) + O(n/k)
This means the hash calculation is constant time, and the search inside one partition grows with the size of that partition, which is total data divided by number of partitions.
[X] Wrong: "Hash partitioning makes every query run in constant time regardless of data size."
[OK] Correct: The hash function is constant time, but searching inside the chosen partition still depends on how much data is in that partition.
Understanding how hash partitioning affects query time shows you can think about data distribution and scaling. This skill helps you design systems that stay fast as data grows.
"What if we increased the number of partitions k? How would that change the time complexity for searching inside partitions?"