UNION and UNION ALL in MySQL - Time & Space Complexity
When combining results from two or more queries, it is important to understand how the time to get results grows as the data grows.
We want to know how the cost changes when using UNION versus UNION ALL in MySQL.
Analyze the time complexity of the following code snippet.
SELECT column1 FROM tableA
UNION
SELECT column1 FROM tableB;
SELECT column1 FROM tableA
UNION ALL
SELECT column1 FROM tableB;
This code combines results from two tables, either removing duplicates (UNION) or keeping all rows (UNION ALL).
Look for repeated work done by the database engine.
- Primary operation: Scanning all rows from both tables.
- How many times: Once for each table's rows.
- Additional operation for UNION: Comparing rows to remove duplicates.
- How many times for duplicate check: Depends on total rows combined.
As the number of rows in each table grows, the work changes differently for UNION and UNION ALL.
| Input Size (rows in each table) | Approx. Operations for UNION | Approx. Operations for UNION ALL |
|---|---|---|
| 10 | About 20 scans + some duplicate checks | About 20 scans only |
| 100 | About 200 scans + more duplicate checks | About 200 scans only |
| 1000 | About 2000 scans + many duplicate checks | About 2000 scans only |
Pattern observation: UNION ALL grows linearly with input size, while UNION adds extra work to compare and remove duplicates, which grows faster.
Time Complexity: O(n) for UNION ALL, O(n log n) for UNION
This means UNION ALL just reads all rows once, but UNION also sorts or compares rows to remove duplicates, which takes more time as data grows.
[X] Wrong: "UNION and UNION ALL take the same time because they both combine rows."
[OK] Correct: UNION must check for duplicates, which adds extra work beyond just combining rows, making it slower as data grows.
Understanding how UNION and UNION ALL scale helps you explain query performance clearly and choose the right tool for combining data efficiently.
"What if we added an ORDER BY clause after UNION? How would that affect the time complexity?"