0
0
SQLquery~15 mins

UNION combining result sets in SQL - Deep Dive

Choose your learning style9 modes available
Overview - UNION combining result sets
What is it?
UNION is a SQL command that combines the results of two or more SELECT queries into a single result set. It removes duplicate rows by default, showing only unique rows from all combined queries. This helps you gather data from different tables or queries as if they were one. UNION works only when the queries have the same number of columns with compatible data types.
Why it matters
Without UNION, you would have to manually merge data from multiple queries or tables, which is slow and error-prone. UNION lets you easily combine data from different sources, making it simpler to analyze and report. For example, if you want a list of customers from two regions stored in separate tables, UNION gives you one clean list without duplicates. This saves time and reduces mistakes in data handling.
Where it fits
Before learning UNION, you should understand basic SELECT queries and how to retrieve data from tables. After UNION, you can learn about UNION ALL, INTERSECT, and EXCEPT for more advanced set operations. Later, you might explore JOINs to combine data horizontally, while UNION combines data vertically.
Mental Model
Core Idea
UNION stacks the results of multiple queries vertically into one list, showing only unique rows.
Think of it like...
Imagine you have two guest lists from different parties, and you want to create one combined guest list without repeating names. UNION is like taking both lists, putting them together, and removing any duplicate names so everyone appears only once.
┌─────────────┐   ┌─────────────┐
│ SELECT A    │   │ SELECT B    │
│ Row1       │   │ Row3       │
│ Row2       │   │ Row2       │
└─────┬───────┘   └─────┬───────┘
      │                 │
      └───────UNION─────┘
            │
   ┌─────────────────────┐
   │ Combined Result Set  │
   │ Row1                │
   │ Row2                │
   │ Row3                │
   └─────────────────────┘
Build-Up - 7 Steps
1
FoundationBasic SELECT query review
🤔
Concept: Understanding how to retrieve data from a single table using SELECT.
A SELECT query asks the database to show you certain columns and rows from one table. For example, SELECT name FROM customers; shows all customer names.
Result
A list of names from the customers table.
Knowing how SELECT works is essential because UNION combines multiple SELECT results.
2
FoundationMatching columns for UNION
🤔
Concept: UNION requires queries to have the same number of columns with compatible types.
If you want to UNION two queries, both must select the same number of columns. For example, SELECT name, city FROM customers UNION SELECT name, city FROM suppliers; works because both have two columns.
Result
A combined list of names and cities from both tables.
Understanding column matching prevents errors and ensures UNION works correctly.
3
IntermediateUNION removes duplicates by default
🤔Before reading on: do you think UNION keeps all rows including duplicates, or removes duplicates? Commit to your answer.
Concept: UNION automatically removes duplicate rows from the combined result.
When you UNION two queries, if the same row appears in both, it will show only once in the result. For example, if both queries return 'Alice, New York', the combined result shows it once.
Result
A list of unique rows from both queries.
Knowing UNION removes duplicates helps you avoid unexpected missing rows and choose UNION ALL if duplicates are needed.
4
IntermediateUsing UNION ALL to keep duplicates
🤔Before reading on: do you think UNION ALL removes duplicates like UNION, or keeps all rows? Commit to your answer.
Concept: UNION ALL combines results but keeps all duplicates.
If you want to see every row from both queries, including duplicates, use UNION ALL. For example, SELECT name FROM table1 UNION ALL SELECT name FROM table2; shows all names, even if repeated.
Result
A combined list including duplicates from both queries.
Understanding UNION ALL lets you control whether duplicates matter in your combined data.
5
IntermediateOrder of rows in UNION results
🤔
Concept: UNION results are unordered unless you specify ORDER BY after the last query.
If you want the combined results sorted, add ORDER BY at the end. For example, SELECT name FROM a UNION SELECT name FROM b ORDER BY name; sorts the combined list alphabetically.
Result
A sorted list of unique rows from both queries.
Knowing where to place ORDER BY avoids confusion about result order.
6
AdvancedColumn data type compatibility rules
🤔Before reading on: do you think UNION allows combining columns of any data types, or only compatible types? Commit to your answer.
Concept: UNION requires columns in the same position to have compatible data types.
If the first column in query one is text, the first column in query two must also be text or convertible to text. Mixing incompatible types causes errors.
Result
Successful UNION only if column types match or convert cleanly.
Understanding data type compatibility prevents runtime errors and data corruption.
7
ExpertPerformance considerations with UNION
🤔Before reading on: do you think UNION is always faster than running queries separately, or can it be slower? Commit to your answer.
Concept: UNION can be slower than UNION ALL because it removes duplicates, which requires extra processing.
When you use UNION, the database must compare all rows to remove duplicates, which takes time and memory. UNION ALL skips this step, so it is faster. Knowing this helps optimize queries for large datasets.
Result
Better query performance by choosing UNION or UNION ALL appropriately.
Knowing the cost of duplicate removal helps write efficient queries in production.
Under the Hood
When you run a UNION, the database executes each SELECT query separately, then combines their results into a temporary set. It then scans this set to remove duplicate rows by comparing all columns. This deduplication step involves sorting or hashing internally. Finally, it returns the unique rows as the final result. UNION ALL skips the deduplication and returns all rows directly.
Why designed this way?
UNION was designed to simplify combining data vertically from multiple sources while ensuring data cleanliness by default. Removing duplicates prevents inflated counts or repeated data in reports. The alternative, UNION ALL, was added later to give control when duplicates are meaningful. This design balances ease of use with flexibility.
┌─────────────┐   ┌─────────────┐
│ SELECT 1    │   │ SELECT 2    │
│ Result Set1 │   │ Result Set2 │
└─────┬───────┘   └─────┬───────┘
      │                 │
      └───────┬─────────┘
              │
      ┌─────────────────┐
      │ Combine Results  │
      └─────┬───────────┘
            │
      ┌───────────────┐
      │ Remove Duplicates │
      └─────┬───────────┘
            │
      ┌───────────────┐
      │ Final Result Set │
      └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does UNION keep duplicates or remove them? Commit to your answer.
Common Belief:UNION keeps all rows from both queries, including duplicates.
Tap to reveal reality
Reality:UNION removes duplicate rows by default, showing only unique rows.
Why it matters:Assuming duplicates remain can cause confusion when expected rows are missing, leading to wrong data analysis.
Quick: Can you use ORDER BY inside each SELECT of a UNION? Commit to your answer.
Common Belief:You can put ORDER BY in each SELECT query inside a UNION.
Tap to reveal reality
Reality:ORDER BY only applies once at the end of the entire UNION, not inside individual SELECTs.
Why it matters:Misplacing ORDER BY causes syntax errors or unexpected results, frustrating learners.
Quick: Does UNION allow combining queries with different numbers of columns? Commit to your answer.
Common Belief:You can UNION queries with different numbers of columns as long as the data types are similar.
Tap to reveal reality
Reality:UNION requires the same number of columns in each query; otherwise, it causes an error.
Why it matters:Trying to UNION mismatched queries wastes time debugging and breaks query execution.
Quick: Is UNION always faster than running separate queries? Commit to your answer.
Common Belief:UNION is always faster because it combines queries in one step.
Tap to reveal reality
Reality:UNION can be slower due to duplicate removal; UNION ALL is faster when duplicates are allowed.
Why it matters:Ignoring performance differences can lead to slow queries in large databases.
Expert Zone
1
UNION's duplicate removal can cause subtle bugs if rows differ in invisible ways like trailing spaces or case sensitivity.
2
The order of queries in UNION can affect performance because some databases optimize by running smaller queries first.
3
Using UNION with complex queries and large datasets may require indexing strategies to improve speed.
When NOT to use
Avoid UNION when you need to combine columns side-by-side; use JOIN instead. Also, if duplicates matter and performance is critical, prefer UNION ALL. For filtering common or distinct rows, consider INTERSECT or EXCEPT.
Production Patterns
In real systems, UNION is used to merge data from partitioned tables, combine results from different data sources, or unify search results. Developers often use UNION ALL for logging or audit data where duplicates are expected and performance matters.
Connections
JOIN
JOIN combines data horizontally by matching rows, while UNION stacks data vertically by combining rows.
Understanding UNION and JOIN together helps you choose how to combine data: add columns or add rows.
Set Theory
UNION in SQL mirrors the union operation in set theory, combining elements from two sets without duplicates.
Knowing set theory clarifies why UNION removes duplicates and how it relates to other set operations like INTERSECT.
Merging Guest Lists (Event Planning)
Combining guest lists from different events without repeats is like UNION combining query results.
This real-world example shows how UNION solves everyday problems of merging lists cleanly.
Common Pitfalls
#1Trying to UNION queries with different numbers of columns.
Wrong approach:SELECT name FROM customers UNION SELECT name, city FROM suppliers;
Correct approach:SELECT name, city FROM customers UNION SELECT name, city FROM suppliers;
Root cause:Misunderstanding that UNION requires the same number of columns in each query.
#2Expecting duplicates to appear in UNION results.
Wrong approach:SELECT name FROM table1 UNION SELECT name FROM table2; -- expecting all duplicates
Correct approach:SELECT name FROM table1 UNION ALL SELECT name FROM table2; -- keeps duplicates
Root cause:Not knowing UNION removes duplicates by default.
#3Placing ORDER BY inside each SELECT in UNION.
Wrong approach:SELECT name FROM a ORDER BY name UNION SELECT name FROM b ORDER BY name;
Correct approach:SELECT name FROM a UNION SELECT name FROM b ORDER BY name;
Root cause:Misunderstanding that ORDER BY applies only once after the last query.
Key Takeaways
UNION combines results from multiple SELECT queries into one list with unique rows.
All queries in UNION must have the same number of columns with compatible data types.
UNION removes duplicates by default; use UNION ALL to keep duplicates.
ORDER BY applies only once at the end of the UNION to sort the combined results.
Choosing between UNION and UNION ALL affects both results and performance.