0
0
PostgreSQLquery~15 mins

Partition types (range, list, hash) in PostgreSQL - Deep Dive

Choose your learning style9 modes available
Overview - Partition types (range, list, hash)
What is it?
Partitioning is a way to split a big table into smaller pieces called partitions. Each partition holds a subset of the data based on rules. PostgreSQL supports three main types of partitions: range, list, and hash. These types decide how data is divided and stored.
Why it matters
Without partitioning, large tables can become slow and hard to manage. Partitioning helps speed up queries and maintenance by working with smaller chunks of data. It also improves performance and makes data easier to organize and back up.
Where it fits
Before learning partition types, you should understand basic SQL tables and indexes. After mastering partitioning, you can explore advanced topics like partition pruning, constraint exclusion, and performance tuning.
Mental Model
Core Idea
Partition types define how a big table is split into smaller, manageable parts based on different rules for organizing data.
Think of it like...
Imagine a large library. Range partitioning is like shelving books by year of publication, list partitioning is like grouping books by genre, and hash partitioning is like randomly assigning books to shelves using a code to balance space.
┌─────────────┐
│   Big Table │
└─────┬───────┘
      │ Partitioned by type
      ▼
┌─────────────┬─────────────┬─────────────┐
│ Range Part. │ List Part.  │ Hash Part.  │
│ (e.g. dates)│ (e.g. types)│ (e.g. hash) │
└─────────────┴─────────────┴─────────────┘
Build-Up - 7 Steps
1
FoundationWhat is Table Partitioning
🤔
Concept: Introducing the idea of splitting a large table into smaller parts called partitions.
Partitioning breaks a big table into smaller pieces. Each piece holds some rows based on a rule. This helps manage and query data faster. Think of it as dividing a big list into smaller lists.
Result
You understand that partitioning is about dividing data to improve management and speed.
Understanding partitioning as data division sets the stage for learning how different rules affect data organization.
2
FoundationPostgreSQL Partitioning Basics
🤔
Concept: How PostgreSQL supports partitioning and the basic setup.
PostgreSQL lets you create partitioned tables using a PARTITION BY clause. You define the type (range, list, or hash) and create partitions as child tables. Queries automatically use the right partitions.
Result
You can create a partitioned table and know that PostgreSQL handles data routing.
Knowing PostgreSQL's built-in support helps you trust the system to manage partitions efficiently.
3
IntermediateRange Partitioning Explained
🤔Before reading on: do you think range partitioning divides data by exact values or by continuous intervals? Commit to your answer.
Concept: Range partitioning splits data into continuous intervals based on a column's value.
Range partitioning divides data by ranges, like dates or numbers. For example, one partition for years 2020-2021, another for 2022-2023. Rows go to the partition where their value fits the range.
Result
Data is grouped by continuous ranges, making queries on intervals faster.
Understanding range partitioning helps optimize queries that filter by continuous values like dates.
4
IntermediateList Partitioning Explained
🤔Before reading on: do you think list partitioning groups data by ranges or by specific values? Commit to your answer.
Concept: List partitioning groups data by specific, discrete values in a column.
List partitioning assigns rows to partitions based on exact values. For example, one partition for 'red' color, another for 'blue'. This is useful for categories or types.
Result
Data is grouped by specific values, speeding up queries filtering by those values.
Knowing list partitioning helps when data naturally falls into distinct categories.
5
IntermediateHash Partitioning Explained
🤔Before reading on: do you think hash partitioning groups data by value ranges or distributes it evenly? Commit to your answer.
Concept: Hash partitioning distributes data evenly across partitions using a hash function on a column.
Hash partitioning uses a hash function to assign rows to partitions randomly but evenly. This balances data size across partitions, useful when no natural grouping exists.
Result
Data is spread evenly, preventing any partition from becoming too large.
Understanding hash partitioning helps balance load and storage when data has no clear grouping.
6
AdvancedChoosing Partition Types in Practice
🤔Before reading on: do you think one partition type fits all cases or does it depend on data and queries? Commit to your answer.
Concept: Selecting the right partition type depends on data patterns and query needs.
Range is great for time-series data, list for categorical data, and hash for balanced distribution. Sometimes combining types or subpartitioning is needed for complex cases.
Result
You can pick partition types that best fit your data and improve performance.
Knowing how to choose partition types prevents poor performance and maintenance headaches.
7
ExpertPartition Pruning and Performance Surprises
🤔Before reading on: do you think PostgreSQL always scans all partitions or can it skip some? Commit to your answer.
Concept: Partition pruning lets PostgreSQL skip irrelevant partitions during queries, but it has limits.
PostgreSQL tries to prune partitions based on query filters, scanning only needed partitions. However, pruning depends on query structure and partition keys. Sometimes pruning fails, causing slow queries.
Result
You understand when pruning works and when it doesn't, helping optimize queries.
Knowing pruning behavior helps avoid surprises and write queries that benefit from partitioning.
Under the Hood
PostgreSQL stores partitions as separate child tables linked to a parent. When inserting, it routes rows to the correct partition using the partition key and type rules. During queries, the planner tries to prune partitions by analyzing filters to scan only relevant partitions, improving speed.
Why designed this way?
Partitioning was designed to handle very large tables by splitting data physically. Range and list types reflect common data grouping needs, while hash was added to balance data evenly. This design balances flexibility, performance, and ease of use.
Parent Table
  │
  ├─ Range Partition 1 (e.g., dates < 2022)
  ├─ Range Partition 2 (e.g., dates >= 2022)
  ├─ List Partition 1 (e.g., category A)
  ├─ List Partition 2 (e.g., category B)
  ├─ Hash Partition 1
  └─ Hash Partition 2

Insert → Partition Key → Partition Type Rules → Target Partition
Myth Busters - 4 Common Misconceptions
Quick: Does hash partitioning group similar values together? Commit yes or no.
Common Belief:Hash partitioning groups similar values into the same partition.
Tap to reveal reality
Reality:Hash partitioning distributes data evenly but does not group similar values together.
Why it matters:Assuming hash groups similar data can lead to wrong query expectations and inefficient designs.
Quick: Can you update a partition key value freely without restrictions? Commit yes or no.
Common Belief:You can update the partition key column of a row like any other column.
Tap to reveal reality
Reality:Updating a partition key value requires moving the row to a different partition, which PostgreSQL does not do automatically.
Why it matters:Ignoring this causes errors or data inconsistency when updating partition keys.
Quick: Does PostgreSQL always prune partitions automatically? Commit yes or no.
Common Belief:PostgreSQL always skips irrelevant partitions during queries.
Tap to reveal reality
Reality:Partition pruning depends on query structure and may not always happen, causing slower queries.
Why it matters:Overestimating pruning can lead to unexpected slow query performance.
Quick: Is partitioning a replacement for indexing? Commit yes or no.
Common Belief:Partitioning replaces the need for indexes on large tables.
Tap to reveal reality
Reality:Partitioning complements indexes but does not replace them; indexes are still needed for fast lookups.
Why it matters:Relying only on partitioning without indexes can cause poor query performance.
Expert Zone
1
Range partitions can overlap if not carefully defined, causing data routing errors.
2
Hash partitioning's distribution depends on the hash function and can be affected by data skew.
3
List partitions require explicit value lists; missing values go to default partitions or cause errors.
When NOT to use
Partitioning is not ideal for small tables or when queries always need full table scans. Alternatives include indexing, materialized views, or sharding for distributed systems.
Production Patterns
In production, range partitioning is common for time-series logs, list partitioning for multi-tenant data by client ID, and hash partitioning for load balancing. Combining partitioning with indexing and careful query design is standard practice.
Connections
Sharding
Partitioning is a form of sharding within a single database instance.
Understanding partitioning helps grasp sharding concepts used in distributed databases.
Hash Functions (Mathematics)
Hash partitioning uses hash functions to distribute data evenly.
Knowing hash functions from math explains why hash partitioning balances data.
Library Organization
Partitioning types mirror ways libraries organize books: by range (years), list (genres), or balanced shelving (hash).
Seeing partitioning as organizing physical items clarifies why different methods suit different data.
Common Pitfalls
#1Trying to update a partition key value directly.
Wrong approach:UPDATE sales SET sale_date = '2024-01-01' WHERE id = 123;
Correct approach:DELETE FROM sales WHERE id = 123; INSERT INTO sales (id, sale_date, ...) VALUES (123, '2024-01-01', ...);
Root cause:Misunderstanding that partition keys determine row location and cannot be changed in place.
#2Defining overlapping range partitions.
Wrong approach:CREATE TABLE sales PARTITION OF sales_parent FOR VALUES FROM ('2020-01-01') TO ('2022-01-01'); CREATE TABLE sales PARTITION OF sales_parent FOR VALUES FROM ('2021-06-01') TO ('2023-01-01');
Correct approach:CREATE TABLE sales PARTITION OF sales_parent FOR VALUES FROM ('2020-01-01') TO ('2021-06-01'); CREATE TABLE sales PARTITION OF sales_parent FOR VALUES FROM ('2021-06-01') TO ('2023-01-01');
Root cause:Not carefully defining non-overlapping ranges causes data routing conflicts.
#3Assuming partition pruning always happens.
Wrong approach:SELECT * FROM sales WHERE sale_date > '2023-01-01' AND EXTRACT(MONTH FROM sale_date) = 5;
Correct approach:SELECT * FROM sales WHERE sale_date >= '2023-05-01' AND sale_date < '2023-06-01';
Root cause:Using functions on partition keys can prevent pruning, causing full scans.
Key Takeaways
Partition types (range, list, hash) define how PostgreSQL splits large tables into smaller parts for better management and performance.
Range partitions group data by continuous intervals, list partitions by specific values, and hash partitions distribute data evenly using a hash function.
Choosing the right partition type depends on your data patterns and query needs to maximize efficiency.
Partition pruning helps speed queries by scanning only relevant partitions but depends on query structure and partition keys.
Understanding partitioning internals and limitations prevents common mistakes and helps design scalable, high-performance databases.