Bird
Raised Fist0
PostgreSQLquery~5 mins

Sub-partitioning in PostgreSQL - Time & Space Complexity

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Time Complexity: Sub-partitioning
O(log n)
Understanding Time Complexity

When using sub-partitioning in a database, we want to understand how the time to find or insert data changes as the data grows.

We ask: How does the work increase when we add more data to sub-partitions?

Scenario Under Consideration

Analyze the time complexity of querying a sub-partitioned table.


CREATE TABLE sales (
  sale_id SERIAL,
  region TEXT,
  sale_date DATE,
  amount NUMERIC
) PARTITION BY LIST (region);

CREATE TABLE sales_us PARTITION OF sales FOR VALUES IN ('US') PARTITION BY RANGE (sale_date);

CREATE TABLE sales_us_2023 PARTITION OF sales_us FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

SELECT * FROM sales WHERE region = 'US' AND sale_date >= '2023-06-01';
    

This code creates a table partitioned by region, then sub-partitioned by date range, and queries data from a specific sub-partition.

Identify Repeating Operations

Look for repeated steps in how the database finds data.

  • Primary operation: Searching partitions and sub-partitions to find matching rows.
  • How many times: The database checks the main partitions (regions), then within the chosen partition, it checks sub-partitions (date ranges).
How Execution Grows With Input

As data grows, the number of partitions and sub-partitions may increase.

Input Size (n)Approx. Operations
10About 2-3 checks (partitions + sub-partitions)
100Still a few checks due to partition pruning
1000More partitions, but query still targets few sub-partitions

Pattern observation: The work grows slowly because the query only looks at relevant partitions, not all data.

Final Time Complexity

Time Complexity: O(log n)

This means the time to find data grows slowly, roughly like the steps needed to find a page in a book index.

Common Mistake

[X] Wrong: "Sub-partitioning makes queries scan all data, so time grows linearly with data size."

[OK] Correct: The database uses partition pruning to skip irrelevant partitions, so it does not scan all data.

Interview Connect

Understanding how sub-partitioning affects query time shows you know how databases handle big data efficiently.

Self-Check

"What if we removed sub-partitioning and only used one level of partitioning? How would the time complexity change?"

Practice

(1/5)
1. What is the main purpose of sub-partitioning in PostgreSQL?
easy
A. To encrypt data within partitions
B. To create backups of partitions automatically
C. To merge multiple partitions into one
D. To split data twice for better organization and faster queries

Solution

  1. Step 1: Understand partitioning basics

    Partitioning divides a table into parts to improve management and performance.
  2. Step 2: Recognize sub-partitioning role

    Sub-partitioning splits each partition further, organizing data more finely and speeding up queries.
  3. Final Answer:

    To split data twice for better organization and faster queries -> Option D
  4. Quick Check:

    Sub-partitioning = double data split [OK]
Hint: Sub-partitioning means splitting partitions again [OK]
Common Mistakes:
  • Thinking sub-partitioning creates backups
  • Confusing sub-partitioning with encryption
  • Believing it merges partitions
2. Which of the following is the correct syntax to create a sub-partitioned table in PostgreSQL?
easy
A. CREATE TABLE sales (id INT, region TEXT, month INT) SUBPARTITION BY RANGE (region) PARTITION BY LIST (month);
B. CREATE TABLE sales (id INT, region TEXT, month INT) PARTITION BY RANGE (region) PARTITION BY LIST (month);
C. CREATE TABLE sales (id INT, region TEXT, month INT) PARTITION BY RANGE (region) SUBPARTITION BY LIST (month);
D. CREATE TABLE sales (id INT, region TEXT, month INT) PARTITION BY LIST (region) SUBPARTITION BY HASH (month);

Solution

  1. Step 1: Identify correct keywords for partitioning

    PostgreSQL uses PARTITION BY for main partition and SUBPARTITION BY for sub-partition.
  2. Step 2: Check syntax order and clauses

    CREATE TABLE sales (id INT, region TEXT, month INT) PARTITION BY RANGE (region) SUBPARTITION BY LIST (month); correctly uses PARTITION BY RANGE then SUBPARTITION BY LIST, matching PostgreSQL syntax.
  3. Final Answer:

    CREATE TABLE sales (id INT, region TEXT, month INT) PARTITION BY RANGE (region) SUBPARTITION BY LIST (month); -> Option C
  4. Quick Check:

    Use PARTITION BY then SUBPARTITION BY [OK]
Hint: Use PARTITION BY first, then SUBPARTITION BY [OK]
Common Mistakes:
  • Using PARTITION BY twice instead of SUBPARTITION BY
  • Swapping PARTITION BY and SUBPARTITION BY keywords
  • Using SUBPARTITION BY before PARTITION BY
3. Given the following table and partitions:
CREATE TABLE orders (id INT, country TEXT, year INT) PARTITION BY LIST (country) SUBPARTITION BY RANGE (year);
CREATE TABLE orders_us PARTITION OF orders FOR VALUES IN ('US') SUBPARTITION BY RANGE (year);
CREATE TABLE orders_us_2022 PARTITION OF orders_us FOR VALUES FROM (2022) TO (2023);

What will be the result of SELECT * FROM orders WHERE country = 'US' AND year = 2022; if there are rows with country 'US' and year 2022?
medium
A. Rows with country 'US' and year 2022 will be returned
B. No rows will be returned because subpartition is missing
C. Syntax error due to incorrect partitioning
D. Rows with any country but year 2022 will be returned

Solution

  1. Step 1: Understand partition and subpartition setup

    The table is partitioned by country (LIST) and subpartitioned by year (RANGE). The 'US' partition and 2022 subpartition exist.
  2. Step 2: Query filters match partition and subpartition

    The query filters country='US' and year=2022, matching the defined partitions, so matching rows will be found.
  3. Final Answer:

    Rows with country 'US' and year 2022 will be returned -> Option A
  4. Quick Check:

    Partition + subpartition match = rows returned [OK]
Hint: Query matches partition and subpartition filters [OK]
Common Mistakes:
  • Assuming no rows because subpartition is complex
  • Thinking query causes syntax error
  • Ignoring subpartition filtering
4. You wrote this code:
CREATE TABLE logs (id INT, region TEXT, day DATE) PARTITION BY RANGE (region) SUBPARTITION BY LIST (day);

What is the error in this statement?
medium
A. RANGE partitioning cannot be done on a TEXT column
B. Partitioning by RANGE requires a numeric or date type, not TEXT
C. Syntax error: SUBPARTITION BY must come before PARTITION BY
D. SUBPARTITION BY LIST cannot be used with RANGE partitioning

Solution

  1. Step 1: Check partition column data type

    Partitioning by RANGE requires a column with an orderable type like numeric or date, not TEXT.
  2. Step 2: Identify the error cause

    Here, region is TEXT, so RANGE partitioning on it is invalid.
  3. Final Answer:

    Partitioning by RANGE requires a numeric or date type, not TEXT -> Option B
  4. Quick Check:

    RANGE needs numeric/date, not TEXT [OK]
Hint: RANGE partition needs numeric or date column [OK]
Common Mistakes:
  • Thinking TEXT can be used for RANGE partitioning
  • Confusing order of PARTITION BY and SUBPARTITION BY
  • Assuming SUBPARTITION BY LIST is invalid with RANGE
5. You want to create a sales table partitioned by region (LIST) and subpartitioned by sale_date (RANGE). Which approach correctly handles the subpartitioning to optimize query performance for recent sales?
hard
A. Partition by LIST on region, then subpartition by RANGE on sale_date with recent years as separate subpartitions
B. Partition by RANGE on sale_date, then subpartition by LIST on region with all regions in one subpartition
C. Partition by HASH on region, no subpartitioning needed for sale_date
D. Partition by LIST on sale_date, then subpartition by RANGE on region

Solution

  1. Step 1: Match partitioning to data and query needs

    Partitioning by region (LIST) groups data by location, then subpartitioning by sale_date (RANGE) organizes by time.
  2. Step 2: Optimize recent sales queries

    Using RANGE subpartitions for recent years allows fast access to recent data, improving query speed.
  3. Final Answer:

    Partition by LIST on region, then subpartition by RANGE on sale_date with recent years as separate subpartitions -> Option A
  4. Quick Check:

    LIST then RANGE for region and date [OK]
Hint: Partition by region LIST, subpartition by date RANGE [OK]
Common Mistakes:
  • Reversing partition and subpartition order
  • Using HASH partitioning without subpartitioning
  • Partitioning sale_date by LIST instead of RANGE