UNION and UNION ALL in PostgreSQL - Time & Space Complexity
When combining results from two queries, it's important to know how the time to get results grows as data grows.
We want to understand how using UNION and UNION ALL affects the work the database does.
Analyze the time complexity of the following code snippet.
SELECT column1 FROM tableA
UNION
SELECT column1 FROM tableB;
SELECT column1 FROM tableA
UNION ALL
SELECT column1 FROM tableB;
This code combines rows from two tables, either removing duplicates (UNION) or keeping all rows (UNION ALL).
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning all rows from both tables.
- How many times: Once for each table's rows.
- Additional for UNION: Comparing rows to remove duplicates, which involves checking all combined rows.
Explain the growth pattern intuitively.
| Input Size (n) | Approx. Operations for UNION ALL | Approx. Operations for UNION |
|---|---|---|
| 10 | About 20 row reads | About 20 reads + some comparisons |
| 100 | About 200 row reads | About 200 reads + more comparisons |
| 1000 | About 2000 row reads | About 2000 reads + many comparisons |
Pattern observation: UNION ALL grows linearly with input size, while UNION adds extra work to check for duplicates, which grows faster.
Time Complexity: O(n) for UNION ALL, O(n log n) for UNION
This means UNION ALL just reads all rows once, while UNION must also sort or compare rows to remove duplicates, which takes more time as data grows.
[X] Wrong: "UNION and UNION ALL take the same time because they both combine rows."
[OK] Correct: UNION removes duplicates, which needs extra work to compare rows, so it takes more time than UNION ALL.
Understanding how UNION and UNION ALL scale helps you explain query performance clearly and shows you know how databases handle data combining.
"What if we added an index on the column used in UNION? How would the time complexity change?"