0
0
MySQLquery~15 mins

UNION and UNION ALL in MySQL - Deep Dive

Choose your learning style9 modes available
Overview - UNION and UNION ALL
What is it?
UNION and UNION ALL are commands in SQL used to combine the results of two or more SELECT queries into a single result set. UNION removes duplicate rows, showing only unique records, while UNION ALL includes all rows, even duplicates. These commands help gather data from multiple tables or queries into one list.
Why it matters
Without UNION and UNION ALL, combining data from different sources would require complex manual merging or multiple queries, making data analysis slow and error-prone. These commands simplify data retrieval, saving time and reducing mistakes when working with related data spread across tables.
Where it fits
Before learning UNION and UNION ALL, you should understand basic SELECT queries and how to filter data. After mastering these, you can explore JOINs to combine tables side-by-side and advanced set operations like INTERSECT and EXCEPT for more complex data comparisons.
Mental Model
Core Idea
UNION and UNION ALL stack query results vertically, with UNION removing duplicates and UNION ALL keeping all rows.
Think of it like...
Imagine you have two guest lists for a party. UNION is like combining the lists but crossing out any names that appear twice, so each guest is invited once. UNION ALL is like putting both lists together without crossing out duplicates, so some guests might get two invitations.
┌───────────────┐       ┌───────────────┐
│ SELECT Query1 │       │ SELECT Query2 │
└──────┬────────┘       └──────┬────────┘
       │                       │
       │                       │
       ▼                       ▼
  ┌─────────────────────────────────┐
  │ UNION or UNION ALL combines rows │
  └──────────────┬──────────────────┘
                 │
       ┌─────────┴─────────┐
       │                   │
   UNION: removes       UNION ALL:
   duplicate rows       keeps all rows
       │                   │
       ▼                   ▼
  Result set          Result set
  with unique         with duplicates
  rows only           included
Build-Up - 7 Steps
1
FoundationBasic SELECT Query Review
🤔
Concept: Understanding how to retrieve data from a single table using SELECT.
A SELECT query asks the database to show certain columns from a table. For example, SELECT name FROM employees; shows all employee names. This is the starting point before combining results from multiple queries.
Result
A list of names from the employees table.
Knowing how SELECT works is essential because UNION and UNION ALL combine the results of multiple SELECT queries.
2
FoundationCombining Two Queries with UNION
🤔
Concept: Using UNION to merge two SELECT queries and remove duplicate rows.
If you have two SELECT queries, like one listing employees from department A and another from department B, you can combine them with UNION: SELECT name FROM employees WHERE dept = 'A' UNION SELECT name FROM employees WHERE dept = 'B'; This shows all unique employee names from both departments.
Result
A list of employee names from both departments without duplicates.
UNION helps create a single list from multiple sources while avoiding repeated entries.
3
IntermediateDifference Between UNION and UNION ALL
🤔Before reading on: do you think UNION ALL removes duplicates like UNION, or keeps all rows including duplicates? Commit to your answer.
Concept: Understanding that UNION ALL keeps duplicates while UNION removes them.
UNION ALL combines results from queries but does not remove duplicates. For example, if an employee appears in both department lists, UNION ALL will show their name twice. This is faster because it skips the step of checking for duplicates.
Result
A combined list including all rows, even repeated ones.
Knowing when to use UNION ALL can improve performance and preserve all data, which is important for accurate counts or logs.
4
IntermediateColumn Compatibility in UNION Queries
🤔Before reading on: do you think the SELECT queries combined by UNION must have the same number of columns and types, or can they differ? Commit to your answer.
Concept: UNION requires each SELECT to have the same number of columns with compatible data types.
When using UNION or UNION ALL, each SELECT must return the same number of columns, and the columns must be similar types (like text with text, numbers with numbers). For example, SELECT name, age FROM employees UNION SELECT name, age FROM managers; works, but SELECT name FROM employees UNION SELECT salary FROM managers; may cause errors if types differ.
Result
Successful combination of queries with matching columns; error if columns mismatch.
Understanding column compatibility prevents errors and ensures meaningful combined results.
5
IntermediateOrdering Results After UNION
🤔
Concept: How to sort the combined result set using ORDER BY after UNION or UNION ALL.
You cannot put ORDER BY inside each SELECT in a UNION. Instead, place ORDER BY after the last SELECT to sort the entire combined result. For example: SELECT name FROM employees UNION SELECT name FROM managers ORDER BY name ASC; sorts all names alphabetically.
Result
A sorted list of unique names from both queries.
Knowing where to place ORDER BY helps control the final output order after combining queries.
6
AdvancedPerformance Considerations Between UNION Types
🤔Before reading on: do you think UNION or UNION ALL is generally faster? Commit to your answer.
Concept: UNION ALL is faster because it skips duplicate removal, which costs extra processing.
UNION removes duplicates by sorting or hashing results, which takes time and memory. UNION ALL simply appends results, so it is faster and uses fewer resources. Use UNION ALL when you know duplicates are not a problem or when duplicates are meaningful.
Result
Faster query execution with UNION ALL compared to UNION.
Understanding performance differences helps write efficient queries for large datasets.
7
ExpertSubtle Effects of NULLs in UNION Operations
🤔Before reading on: do you think NULL values are treated as equal or different when UNION removes duplicates? Commit to your answer.
Concept: UNION treats NULLs as equal when removing duplicates, so rows with NULL in the same columns are considered duplicates.
If two rows differ only by NULL values in some columns, UNION will keep only one of them. This can cause unexpected data loss if NULLs represent unknown or different values. UNION ALL keeps all rows including those with NULLs.
Result
UNION result set with one row for duplicates including NULLs; UNION ALL with all rows.
Knowing how NULLs affect duplicate removal prevents subtle bugs in data analysis.
Under the Hood
When you run a UNION query, the database executes each SELECT separately, then combines the results. For UNION, it sorts or hashes the combined rows to find and remove duplicates before returning the final list. UNION ALL skips this step and simply concatenates the results. Internally, this duplicate removal can involve temporary storage and extra CPU work.
Why designed this way?
UNION was designed to provide a simple way to merge query results while ensuring uniqueness, which is often needed in reporting and data analysis. UNION ALL was added later to offer a faster option when duplicates are acceptable or desired. This separation balances correctness and performance.
┌───────────────┐   ┌───────────────┐
│  SELECT #1    │   │  SELECT #2    │
└──────┬────────┘   └──────┬────────┘
       │                   │
       ▼                   ▼
  ┌─────────────────────────────┐
  │  Combine all rows together   │
  └──────────────┬──────────────┘
                 │
        ┌────────┴─────────┐
        │                  │
    UNION:           UNION ALL:
    Remove duplicates  Keep all rows
    (sort/hash step)   (no extra step)
        │                  │
        ▼                  ▼
  Final unique set    Final full set
Myth Busters - 4 Common Misconceptions
Quick: Does UNION ALL remove duplicate rows or keep them all? Commit to yes or no.
Common Belief:UNION ALL removes duplicates just like UNION.
Tap to reveal reality
Reality:UNION ALL keeps all rows, including duplicates, without filtering.
Why it matters:Using UNION ALL when duplicates matter can cause incorrect counts or repeated data in reports.
Quick: Can you put ORDER BY inside each SELECT in a UNION query? Commit to yes or no.
Common Belief:You can sort each SELECT individually inside a UNION query.
Tap to reveal reality
Reality:ORDER BY applies only once after all SELECTs are combined, not inside each SELECT.
Why it matters:Trying to order inside each SELECT causes syntax errors or unexpected results.
Quick: Do NULL values count as different when UNION removes duplicates? Commit to yes or no.
Common Belief:Rows with NULLs in the same columns are treated as different and both kept.
Tap to reveal reality
Reality:UNION treats NULLs as equal, so duplicates with NULLs are removed.
Why it matters:This can lead to losing rows that seem different but have NULLs, causing data loss.
Quick: Do the SELECT queries combined by UNION need to have the same number of columns? Commit to yes or no.
Common Belief:SELECT queries in UNION can have different numbers of columns.
Tap to reveal reality
Reality:All SELECT queries must have the same number of columns with compatible types.
Why it matters:Mismatched columns cause errors and prevent query execution.
Expert Zone
1
UNION's duplicate removal uses sorting or hashing internally, and the choice depends on the database engine and data size.
2
Column names in the final result come from the first SELECT query, which can confuse if later SELECTs have different aliases.
3
Using UNION ALL with DISTINCT on the combined result can mimic UNION but with more control over when duplicates are removed.
When NOT to use
Avoid UNION when you need to combine tables side-by-side; use JOINs instead. Also, if you want to keep duplicates for accurate counts or logs, prefer UNION ALL. For complex set operations like finding common rows, use INTERSECT or EXISTS queries.
Production Patterns
In real systems, UNION ALL is often used for log aggregation where duplicates matter, while UNION is used in reporting to avoid double counting. Developers also use UNION to merge partitioned data or results from different sources into a single view.
Connections
JOIN
JOIN combines tables side-by-side (horizontal), UNION stacks results vertically.
Understanding UNION alongside JOIN helps grasp different ways to combine data: adding rows versus adding columns.
Set Theory
UNION corresponds to the union operation in set theory, combining elements from sets.
Knowing set theory clarifies why UNION removes duplicates and how it relates to mathematical sets.
Data Deduplication in Storage Systems
Both UNION and data deduplication remove repeated data to save space or improve clarity.
Recognizing this connection helps understand the importance and cost of removing duplicates in different fields.
Common Pitfalls
#1Trying to combine SELECT queries with different numbers of columns.
Wrong approach:SELECT name FROM employees UNION SELECT name, age FROM managers;
Correct approach:SELECT name, NULL as age FROM employees UNION SELECT name, age FROM managers;
Root cause:Misunderstanding that UNION requires matching column counts and compatible types.
#2Using ORDER BY inside each SELECT in a UNION query.
Wrong approach:SELECT name FROM employees ORDER BY name UNION SELECT name FROM managers ORDER BY name;
Correct approach:SELECT name FROM employees UNION SELECT name FROM managers ORDER BY name;
Root cause:Not knowing that ORDER BY applies only once after UNION combines all results.
#3Expecting UNION ALL to remove duplicates like UNION.
Wrong approach:SELECT name FROM employees UNION ALL SELECT name FROM managers; -- expecting unique names only
Correct approach:SELECT name FROM employees UNION SELECT name FROM managers; -- removes duplicates
Root cause:Confusing the behavior of UNION ALL with UNION regarding duplicates.
Key Takeaways
UNION and UNION ALL combine results from multiple SELECT queries into one list, stacking rows vertically.
UNION removes duplicate rows to show unique results, while UNION ALL keeps all rows including duplicates.
All SELECT queries combined by UNION or UNION ALL must have the same number of columns with compatible data types.
ORDER BY applies only once after all SELECTs are combined, not inside each individual SELECT.
Choosing between UNION and UNION ALL affects query performance and result accuracy, especially with duplicates and NULL values.