Bird
Raised Fist0
PostgreSQLquery~10 mins

Hash partitioning for distribution in PostgreSQL - Step-by-Step Execution

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Concept Flow - Hash partitioning for distribution
Start: Insert row
Compute hash on partition key
Modulo hash by number of partitions
Determine target partition
Insert row into target partition
End
When inserting data, the system computes a hash of the partition key, uses modulo to pick a partition, then inserts the row there.
Execution Sample
PostgreSQL
CREATE TABLE sales (
  id INT,
  region TEXT
) PARTITION BY HASH (region);

CREATE TABLE sales_p0 PARTITION OF sales FOR VALUES WITH (MODULUS 4, REMAINDER 0);
Defines a table partitioned by hashing the 'region' column into 4 partitions, creating one partition for remainder 0.
Execution Table
StepInput Row (region)Hash(region)Modulo (hash % 4)Target PartitionAction
1'north'hash('north') = 123456123456 % 4 = 0sales_p0Insert into sales_p0
2'east'hash('east') = 234567234567 % 4 = 3sales_p3Insert into sales_p3
3'south'hash('south') = 345678345678 % 4 = 2sales_p2Insert into sales_p2
4'west'hash('west') = 456789456789 % 4 = 1sales_p1Insert into sales_p1
5No more rows---Stop insertion
💡 All rows assigned to partitions based on hash modulo; insertion ends when no more rows.
Variable Tracker
VariableStartAfter 1After 2After 3After 4Final
regionnull'north''east''south''west'null
hash(region)null123456234567345678456789null
modulonull0321null
target_partitionnullsales_p0sales_p3sales_p2sales_p1null
Key Moments - 3 Insights
Why does the same region always go to the same partition?
Because the hash function always produces the same hash for the same input, so modulo result and partition stay consistent (see execution_table rows 1-4).
What happens if the number of partitions changes?
The modulo divisor changes, so the partition assignment can change for existing data, requiring data redistribution.
Why do we use modulo after hashing?
Modulo limits the hash value to the number of partitions, ensuring the row maps to a valid partition (see execution_table column 'Modulo').
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, which partition does the region 'south' get assigned to?
Asales_p2
Bsales_p1
Csales_p0
Dsales_p3
💡 Hint
Check row 3 in the execution_table under 'Target Partition'.
At which step does the modulo operation result in 1?
AStep 3
BStep 2
CStep 4
DStep 1
💡 Hint
Look at the 'Modulo' column in the execution_table for value 1.
If the number of partitions changed from 4 to 5, what would change in the execution_table?
AHash values would change
BModulo divisor would change to 5
CTarget partitions would remain the same
DRows would not be inserted
💡 Hint
Modulo is hash % number_of_partitions, so changing partitions changes modulo divisor.
Concept Snapshot
Hash partitioning distributes rows by:
- Computing hash of partition key
- Using modulo with number of partitions
- Assigning row to partition by modulo result
This ensures even data spread and consistent partitioning.
Full Transcript
Hash partitioning in PostgreSQL works by computing a hash value of the partition key for each row. Then, the system calculates the modulo of this hash by the number of partitions to find which partition the row belongs to. For example, if there are 4 partitions, the modulo is hash % 4, resulting in a number from 0 to 3. Each number corresponds to a partition. Rows with the same key always go to the same partition because the hash is consistent. This method helps distribute data evenly across partitions. If the number of partitions changes, the modulo divisor changes, which can cause rows to move to different partitions. The execution table shows step-by-step how rows with different region values are assigned to partitions based on their hash and modulo results.

Practice

(1/5)
1. What is the main purpose of hash partitioning in PostgreSQL?
easy
A. To split data into parts using a hash function on a column
B. To sort data alphabetically in each partition
C. To store data only on one server
D. To encrypt data for security

Solution

  1. Step 1: Understand hash partitioning concept

    Hash partitioning divides data by applying a hash function to a column value, distributing rows into partitions.
  2. Step 2: Compare options with concept

    Only To split data into parts using a hash function on a column correctly describes this purpose; others describe unrelated features.
  3. Final Answer:

    To split data into parts using a hash function on a column -> Option A
  4. Quick Check:

    Hash partitioning = split data by hash [OK]
Hint: Hash partitioning splits data by hashing a column [OK]
Common Mistakes:
  • Confusing hash partitioning with sorting
  • Thinking it stores data on one server only
  • Mixing partitioning with encryption
2. Which of the following is the correct syntax to create a hash partitioned table in PostgreSQL?
easy
A. CREATE TABLE sales PARTITION BY COLUMN (region);
B. CREATE TABLE sales PARTITION BY RANGE (region);
C. CREATE TABLE sales PARTITION BY LIST (region);
D. CREATE TABLE sales PARTITION BY HASH (region);

Solution

  1. Step 1: Identify partitioning syntax

    PostgreSQL uses PARTITION BY HASH(column) to create hash partitions.
  2. Step 2: Match syntax with options

    Only CREATE TABLE sales PARTITION BY HASH (region); uses PARTITION BY HASH correctly; others use different partition types or invalid syntax.
  3. Final Answer:

    CREATE TABLE sales PARTITION BY HASH (region); -> Option D
  4. Quick Check:

    Hash partition syntax = PARTITION BY HASH [OK]
Hint: Use PARTITION BY HASH(column) for hash partitioning [OK]
Common Mistakes:
  • Using RANGE or LIST instead of HASH
  • Writing PARTITION BY COLUMN which is invalid
  • Omitting parentheses around column
3. Given the table definition:
CREATE TABLE users (id INT, name TEXT) PARTITION BY HASH (id);
CREATE TABLE users_p0 PARTITION OF users FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE users_p1 PARTITION OF users FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE users_p2 PARTITION OF users FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE users_p3 PARTITION OF users FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Which partition will the row with id = 7 be stored in?
medium
A. users_p3
B. users_p1
C. users_p2
D. users_p0

Solution

  1. Step 1: Calculate hash partition remainder

    Partition is chosen by (hash(id) % modulus). Assuming hash(id) = id for simplicity, 7 % 4 = 3.
  2. Step 2: Match remainder to partition

    Remainder 3 corresponds to partition users_p3.
  3. Final Answer:

    users_p3 -> Option A
  4. Quick Check:

    7 % 4 = 3 -> users_p3 [OK]
Hint: Compute id % modulus to find partition remainder [OK]
Common Mistakes:
  • Confusing remainder with modulus
  • Using id directly without modulo
  • Mixing partition numbers
4. You try to create a hash partition with this command:
CREATE TABLE orders_p0 PARTITION OF orders FOR VALUES WITH (MODULUS 3, REMAINDER 3);

What is the error in this statement?
medium
A. MODULUS must be a prime number
B. REMAINDER cannot be equal to or greater than MODULUS
C. Partition name must start with 'orders_'
D. FOR VALUES WITH is not valid syntax for hash partitions

Solution

  1. Step 1: Understand modulus and remainder rules

    Remainder must be less than modulus because remainder is result of modulo operation.
  2. Step 2: Check given values

    MODULUS is 3, REMAINDER is 3, which is invalid since remainder must be 0, 1, or 2.
  3. Final Answer:

    REMAINDER cannot be equal to or greater than MODULUS -> Option B
  4. Quick Check:

    Remainder < Modulus rule violated [OK]
Hint: Remainder must be less than modulus in partitions [OK]
Common Mistakes:
  • Setting remainder equal to modulus
  • Thinking modulus must be prime
  • Misunderstanding syntax for hash partitions
5. You want to distribute a large logs table by hashing the user_id column into 5 partitions. Which of the following is the correct way to define the partitions?
hard
A. Create 5 partitions with MODULUS 6 and REMAINDER values from 0 to 5
B. Create 5 partitions with MODULUS 4 and REMAINDER values from 0 to 4
C. Create 5 partitions with MODULUS 5 and REMAINDER values from 0 to 4
D. Create 5 partitions with MODULUS 5 and REMAINDER values from 1 to 5

Solution

  1. Step 1: Understand modulus and remainder for partitions

    MODULUS is total partitions count; REMAINDER ranges from 0 to MODULUS-1.
  2. Step 2: Apply to 5 partitions

    MODULUS must be 5; REMAINDER values must be 0,1,2,3,4 for 5 partitions.
  3. Final Answer:

    Create 5 partitions with MODULUS 5 and REMAINDER values from 0 to 4 -> Option C
  4. Quick Check:

    Partitions = MODULUS 5, REMAINDER 0-4 [OK]
Hint: Use MODULUS = partitions count; REMAINDER from 0 to MODULUS-1 [OK]
Common Mistakes:
  • Setting remainder range incorrectly
  • Using wrong modulus number
  • Starting remainder at 1 instead of 0