Why set operations are needed in SQL - Performance Analysis
We want to understand how the time to run set operations in SQL changes as the data grows.
How does combining or comparing tables affect the work the database does?
Analyze the time complexity of this SQL set operation example.
SELECT employee_id FROM employees
UNION
SELECT employee_id FROM contractors;
This query combines employee IDs from two tables, removing duplicates.
Look for repeated work in the query.
- Primary operation: Scanning each table's rows to collect IDs.
- How many times: Once per table, then comparing combined results to remove duplicates.
As the number of rows in each table grows, the work to scan and combine grows too.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 20 scans and comparisons |
| 100 | About 200 scans and comparisons |
| 1000 | About 2000 scans and comparisons |
Pattern observation: The work grows roughly in direct proportion to the total number of rows combined.
Time Complexity: O(n)
This means the time to run the set operation grows linearly with the total number of rows involved.
[X] Wrong: "Set operations are instant no matter how big the tables are."
[OK] Correct: The database must look at every row to combine and remove duplicates, so bigger tables take more time.
Understanding how set operations scale helps you explain database performance clearly and confidently.
"What if we used UNION ALL instead of UNION? How would the time complexity change?"