Why set operations matter in PostgreSQL - Performance Analysis
When working with databases, set operations like UNION or INTERSECT combine or compare groups of rows.
We want to know how the time to run these operations changes as the data grows.
Analyze the time complexity of the following SQL using UNION.
SELECT id FROM employees
UNION
SELECT id FROM managers;
This code combines two lists of IDs, removing duplicates to get unique IDs from both tables.
Look for repeated work done by the database engine.
- Primary operation: Scanning each table's rows to collect IDs.
- How many times: Once per table, then comparing all collected rows to remove duplicates.
As the number of rows in each table grows, the work to scan and compare grows too.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 20 scans and comparisons |
| 100 | About 200 scans and comparisons |
| 1000 | About 2000 scans and comparisons |
Pattern observation: The work grows roughly in direct proportion to the total number of rows combined.
Time Complexity: O(n log n)
This means the time to run the UNION grows roughly in proportion to n log n as the total number of rows increases.
[X] Wrong: "UNION runs instantly no matter how big the tables are."
[OK] Correct: The database must look at every row to combine and remove duplicates, so bigger tables take more time.
Understanding how set operations scale helps you write efficient queries and explain your choices clearly.
"What if we used UNION ALL instead of UNION? How would the time complexity change?"