Sub-partitioning in PostgreSQL - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When using sub-partitioning in a database, we want to understand how the time to find or insert data changes as the data grows.
We ask: How does the work increase when we add more data to sub-partitions?
Analyze the time complexity of querying a sub-partitioned table.
CREATE TABLE sales (
sale_id SERIAL,
region TEXT,
sale_date DATE,
amount NUMERIC
) PARTITION BY LIST (region);
CREATE TABLE sales_us PARTITION OF sales FOR VALUES IN ('US') PARTITION BY RANGE (sale_date);
CREATE TABLE sales_us_2023 PARTITION OF sales_us FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
SELECT * FROM sales WHERE region = 'US' AND sale_date >= '2023-06-01';
This code creates a table partitioned by region, then sub-partitioned by date range, and queries data from a specific sub-partition.
Look for repeated steps in how the database finds data.
- Primary operation: Searching partitions and sub-partitions to find matching rows.
- How many times: The database checks the main partitions (regions), then within the chosen partition, it checks sub-partitions (date ranges).
As data grows, the number of partitions and sub-partitions may increase.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 2-3 checks (partitions + sub-partitions) |
| 100 | Still a few checks due to partition pruning |
| 1000 | More partitions, but query still targets few sub-partitions |
Pattern observation: The work grows slowly because the query only looks at relevant partitions, not all data.
Time Complexity: O(log n)
This means the time to find data grows slowly, roughly like the steps needed to find a page in a book index.
[X] Wrong: "Sub-partitioning makes queries scan all data, so time grows linearly with data size."
[OK] Correct: The database uses partition pruning to skip irrelevant partitions, so it does not scan all data.
Understanding how sub-partitioning affects query time shows you know how databases handle big data efficiently.
"What if we removed sub-partitioning and only used one level of partitioning? How would the time complexity change?"
Practice
sub-partitioning in PostgreSQL?Solution
Step 1: Understand partitioning basics
Partitioning divides a table into parts to improve management and performance.Step 2: Recognize sub-partitioning role
Sub-partitioning splits each partition further, organizing data more finely and speeding up queries.Final Answer:
To split data twice for better organization and faster queries -> Option DQuick Check:
Sub-partitioning = double data split [OK]
- Thinking sub-partitioning creates backups
- Confusing sub-partitioning with encryption
- Believing it merges partitions
Solution
Step 1: Identify correct keywords for partitioning
PostgreSQL usesPARTITION BYfor main partition andSUBPARTITION BYfor sub-partition.Step 2: Check syntax order and clauses
CREATE TABLE sales (id INT, region TEXT, month INT) PARTITION BY RANGE (region) SUBPARTITION BY LIST (month); correctly usesPARTITION BY RANGEthenSUBPARTITION BY LIST, matching PostgreSQL syntax.Final Answer:
CREATE TABLE sales (id INT, region TEXT, month INT) PARTITION BY RANGE (region) SUBPARTITION BY LIST (month); -> Option CQuick Check:
Use PARTITION BY then SUBPARTITION BY [OK]
- Using PARTITION BY twice instead of SUBPARTITION BY
- Swapping PARTITION BY and SUBPARTITION BY keywords
- Using SUBPARTITION BY before PARTITION BY
CREATE TABLE orders (id INT, country TEXT, year INT) PARTITION BY LIST (country) SUBPARTITION BY RANGE (year);
CREATE TABLE orders_us PARTITION OF orders FOR VALUES IN ('US') SUBPARTITION BY RANGE (year);
CREATE TABLE orders_us_2022 PARTITION OF orders_us FOR VALUES FROM (2022) TO (2023);
What will be the result of
SELECT * FROM orders WHERE country = 'US' AND year = 2022; if there are rows with country 'US' and year 2022?Solution
Step 1: Understand partition and subpartition setup
The table is partitioned by country (LIST) and subpartitioned by year (RANGE). The 'US' partition and 2022 subpartition exist.Step 2: Query filters match partition and subpartition
The query filters country='US' and year=2022, matching the defined partitions, so matching rows will be found.Final Answer:
Rows with country 'US' and year 2022 will be returned -> Option AQuick Check:
Partition + subpartition match = rows returned [OK]
- Assuming no rows because subpartition is complex
- Thinking query causes syntax error
- Ignoring subpartition filtering
CREATE TABLE logs (id INT, region TEXT, day DATE) PARTITION BY RANGE (region) SUBPARTITION BY LIST (day);
What is the error in this statement?
Solution
Step 1: Check partition column data type
Partitioning by RANGE requires a column with an orderable type like numeric or date, not TEXT.Step 2: Identify the error cause
Here,regionis TEXT, so RANGE partitioning on it is invalid.Final Answer:
Partitioning by RANGE requires a numeric or date type, not TEXT -> Option BQuick Check:
RANGE needs numeric/date, not TEXT [OK]
- Thinking TEXT can be used for RANGE partitioning
- Confusing order of PARTITION BY and SUBPARTITION BY
- Assuming SUBPARTITION BY LIST is invalid with RANGE
region (LIST) and subpartitioned by sale_date (RANGE). Which approach correctly handles the subpartitioning to optimize query performance for recent sales?Solution
Step 1: Match partitioning to data and query needs
Partitioning by region (LIST) groups data by location, then subpartitioning by sale_date (RANGE) organizes by time.Step 2: Optimize recent sales queries
Using RANGE subpartitions for recent years allows fast access to recent data, improving query speed.Final Answer:
Partition by LIST on region, then subpartition by RANGE on sale_date with recent years as separate subpartitions -> Option AQuick Check:
LIST then RANGE for region and date [OK]
- Reversing partition and subpartition order
- Using HASH partitioning without subpartitioning
- Partitioning sale_date by LIST instead of RANGE
