Why partitioning is needed in PostgreSQL - Performance Analysis
When working with very large tables, queries can slow down a lot. We want to understand how partitioning helps manage this.
How does splitting data into parts affect the time it takes to run queries?
Analyze the time complexity of querying a large table with and without partitioning.
-- Query without partitioning (full table scan)
SELECT * FROM sales WHERE sale_date = '2024-01-01';
-- Same query with partitioning (partition pruning)
SELECT * FROM sales WHERE sale_date = '2024-01-01';
The first query scans the whole sales table. With partitioning, the query accesses only one partition for the date due to partition pruning.
Look at what repeats when running these queries.
- Primary operation: Scanning rows in the sales table or partition.
- How many times: Without partitioning, all rows are checked. With partitioning, only rows in one partition are checked.
As the sales table grows, scanning all rows takes longer.
| Input Size (rows) | Approx. Operations Without Partitioning | Approx. Operations With Partitioning |
|---|---|---|
| 10,000 | 10,000 | ~300 (one day partition) |
| 100,000 | 100,000 | ~300 |
| 1,000,000 | 1,000,000 | ~300 |
Pattern observation: Without partitioning, operations grow with total rows. With partitioning, operations stay about the same, only depending on partition size.
Time Complexity: O(n) without partitioning, O(k) with partitioning where k << n
This means scanning the whole table grows with total rows, but partitioning limits scanning to a smaller part, making queries faster.
[X] Wrong: "Partitioning always makes queries instant regardless of data size."
[OK] Correct: Partitioning helps by limiting data scanned, but if partitions are large or queries span many partitions, it still takes time.
Understanding how partitioning changes query time helps you explain real-world database design choices clearly and confidently.
"What if we changed partitioning from date-based to customer-based? How would the time complexity change?"