0
0
SQLquery~15 mins

UNION ALL with duplicates in SQL - Deep Dive

Choose your learning style9 modes available
Overview - UNION ALL with duplicates
What is it?
UNION ALL is a SQL command that combines the results of two or more SELECT queries into one result set. Unlike UNION, UNION ALL does not remove duplicate rows; it keeps all rows from each query, including duplicates. This means if the same row appears in both queries, it will appear twice in the combined result. It is useful when you want to see every record without filtering duplicates.
Why it matters
Without UNION ALL, you might lose important repeated data because UNION removes duplicates. This can cause incorrect results in reports or data analysis where duplicates matter. UNION ALL solves this by preserving every row, making it essential for tasks like merging logs, combining datasets with overlapping entries, or counting all occurrences. Without it, you would have to write more complex queries or process data outside the database.
Where it fits
Before learning UNION ALL, you should understand basic SELECT queries and the concept of combining results with UNION. After mastering UNION ALL, you can explore advanced set operations like INTERSECT and EXCEPT, and learn how to optimize queries that handle large datasets with duplicates.
Mental Model
Core Idea
UNION ALL stacks query results on top of each other, keeping every row exactly as it appears, including duplicates.
Think of it like...
Imagine two stacks of papers from different departments. UNION ALL is like putting one stack on top of the other without checking for repeated pages, so all pages stay in the final pile.
┌─────────────┐   ┌─────────────┐
│ Query 1     │   │ Query 2     │
│ Row A       │   │ Row B       │
│ Row B       │   │ Row C       │
└─────┬───────┘   └─────┬───────┘
      │                 │
      └───────┬─────────┘
              │
       UNION ALL combines
              │
      ┌───────────────────┐
      │ Row A             │
      │ Row B             │
      │ Row B             │
      │ Row C             │
      └───────────────────┘
Build-Up - 7 Steps
1
FoundationBasic SELECT Queries
🤔
Concept: Understanding how to retrieve data from a single table using SELECT.
A SELECT query asks the database to give you rows from a table. For example, SELECT name FROM employees; returns all employee names.
Result
A list of names from the employees table.
Knowing how to write simple SELECT queries is the foundation for combining multiple queries later.
2
FoundationCombining Results with UNION
🤔
Concept: Using UNION to merge results from two queries while removing duplicates.
UNION takes two SELECT queries and merges their results into one list, but it removes any rows that appear more than once. For example, combining SELECT name FROM employees and SELECT name FROM managers will give a list of unique names.
Result
A combined list of unique names from both tables.
Understanding UNION helps you see how SQL handles duplicates when merging data.
3
IntermediateIntroducing UNION ALL
🤔Before reading on: do you think UNION ALL removes duplicates like UNION, or keeps all rows including duplicates? Commit to your answer.
Concept: UNION ALL combines results from multiple queries but keeps every row, including duplicates.
Unlike UNION, UNION ALL does not check for duplicates. It simply stacks all rows from each query. For example, if 'Alice' appears in both queries, UNION ALL will show 'Alice' twice.
Result
A combined list including all rows from both queries, duplicates included.
Knowing that UNION ALL preserves duplicates is key to choosing the right command for your data needs.
4
IntermediatePerformance Differences Between UNION and UNION ALL
🤔Before reading on: which do you think is faster, UNION or UNION ALL? Commit to your answer.
Concept: UNION ALL is faster because it skips the step of removing duplicates.
When combining large datasets, UNION must check every row to remove duplicates, which takes time. UNION ALL just appends results, so it runs faster and uses fewer resources.
Result
Queries using UNION ALL execute more quickly, especially on large data.
Understanding performance helps you write efficient queries and avoid unnecessary slowdowns.
5
IntermediateUse Cases for UNION ALL with Duplicates
🤔
Concept: Situations where keeping duplicates is necessary or beneficial.
Examples include combining logs from multiple sources where repeated events matter, counting total sales including repeated transactions, or merging datasets where duplicates represent real data points.
Result
Accurate combined data that reflects all occurrences, not just unique ones.
Recognizing when duplicates matter prevents data loss and incorrect analysis.
6
AdvancedHandling Column Compatibility in UNION ALL
🤔Before reading on: do you think UNION ALL requires the same number and types of columns in each SELECT? Commit to your answer.
Concept: UNION ALL requires each SELECT query to have the same number of columns with compatible data types.
If the queries have different columns or incompatible types, the database will return an error. You must align columns by selecting matching columns or using casts.
Result
Successful UNION ALL queries with matching columns or errors if mismatched.
Knowing this prevents frustrating errors and helps design queries that combine data correctly.
7
ExpertUNION ALL and Query Optimization Internals
🤔Before reading on: do you think the database engine processes UNION ALL by scanning all rows once or multiple times? Commit to your answer.
Concept: UNION ALL is optimized to simply concatenate results without sorting or deduplication, reducing CPU and memory usage.
Internally, the database executes each SELECT separately and streams results directly into the final output. This avoids extra steps like sorting or hashing needed for UNION.
Result
Faster query execution and lower resource consumption for UNION ALL compared to UNION.
Understanding internal processing helps in writing queries that scale well and use resources efficiently.
Under the Hood
When you run UNION ALL, the database executes each SELECT query independently. It then appends the results one after another without checking for duplicates. This means no extra sorting or filtering happens after fetching the rows. The combined result is streamed directly to the client or next operation.
Why designed this way?
UNION ALL was designed to provide a fast way to combine datasets when duplicates are meaningful or when performance is critical. Removing duplicates requires extra work like sorting or hashing, which slows queries. By skipping this, UNION ALL offers a simpler, faster alternative for many use cases.
┌───────────────┐   ┌───────────────┐
│ SELECT Query 1│   │ SELECT Query 2│
│ (Executes)   │   │ (Executes)   │
└───────┬───────┘   └───────┬───────┘
        │                   │
        └───────┬───────────┘
                │
        ┌───────▼───────────┐
        │ Append Results     │
        │ (No duplicate     │
        │  checking)        │
        └───────┬───────────┘
                │
        ┌───────▼───────────┐
        │ Final Result Set   │
        └───────────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does UNION ALL remove duplicate rows automatically? Commit to yes or no.
Common Belief:UNION ALL removes duplicates just like UNION does.
Tap to reveal reality
Reality:UNION ALL does not remove duplicates; it keeps every row from all queries.
Why it matters:Believing UNION ALL removes duplicates can cause unexpected repeated rows in reports or data exports.
Quick: Is UNION ALL always faster than UNION? Commit to yes or no.
Common Belief:UNION ALL is always faster regardless of data size or query complexity.
Tap to reveal reality
Reality:While UNION ALL is generally faster, if the queries are small or the database optimizes UNION well, the difference may be negligible.
Why it matters:Assuming UNION ALL is always better can lead to premature optimization or ignoring cases where deduplication is needed.
Quick: Can UNION ALL combine queries with different numbers of columns? Commit to yes or no.
Common Belief:UNION ALL can combine any SELECT queries regardless of column count or types.
Tap to reveal reality
Reality:UNION ALL requires the same number of columns with compatible data types in each SELECT.
Why it matters:Trying to combine mismatched queries causes errors and confusion.
Quick: Does UNION ALL guarantee the order of rows in the result? Commit to yes or no.
Common Belief:UNION ALL returns rows in the order they appear in the first query, then the second, and so on.
Tap to reveal reality
Reality:UNION ALL does not guarantee order unless an ORDER BY clause is used on the combined result.
Why it matters:Relying on implicit order can cause bugs or inconsistent outputs.
Expert Zone
1
UNION ALL can be combined with window functions to analyze duplicates without removing them.
2
Some databases optimize UNION ALL by parallelizing the execution of each SELECT query for faster results.
3
Using UNION ALL with large datasets requires careful indexing and query planning to avoid performance bottlenecks.
When NOT to use
Avoid UNION ALL when you need a unique set of rows; use UNION instead. For complex deduplication, consider using DISTINCT or analytic functions. If you need to merge data with different columns, use JOINs or subqueries instead.
Production Patterns
In production, UNION ALL is often used to merge logs from multiple servers, combine partitioned tables, or append new data batches. It is also common in ETL pipelines where duplicates are meaningful and must be preserved for auditing.
Connections
Set Theory
UNION ALL corresponds to the multiset union operation, which allows duplicates, unlike the standard set union.
Understanding multisets in math helps grasp why UNION ALL keeps duplicates and how it differs from pure set union.
Data Streaming
UNION ALL's append-only behavior is similar to streaming data pipelines that concatenate data chunks without filtering.
Knowing streaming concepts clarifies why UNION ALL is efficient and suitable for real-time data merging.
Inventory Counting
UNION ALL is like counting every item in stock from multiple warehouses, including duplicates, to get total quantity.
Relating to inventory helps understand why duplicates matter and must be preserved in some data operations.
Common Pitfalls
#1Expecting UNION ALL to remove duplicates automatically.
Wrong approach:SELECT name FROM employees UNION ALL SELECT name FROM managers; -- expecting unique names
Correct approach:SELECT name FROM employees UNION SELECT name FROM managers; -- removes duplicates
Root cause:Confusing UNION ALL with UNION leads to unexpected repeated rows.
#2Combining queries with different column counts using UNION ALL.
Wrong approach:SELECT name FROM employees UNION ALL SELECT name, department FROM managers;
Correct approach:SELECT name FROM employees UNION ALL SELECT name FROM managers;
Root cause:Not matching column numbers and types causes syntax errors.
#3Assuming UNION ALL preserves row order without ORDER BY.
Wrong approach:SELECT name FROM employees UNION ALL SELECT name FROM managers; -- expecting ordered output
Correct approach:SELECT name FROM (SELECT name FROM employees UNION ALL SELECT name FROM managers) AS combined ORDER BY name;
Root cause:Misunderstanding that SQL result order is undefined without explicit sorting.
Key Takeaways
UNION ALL combines results from multiple queries and keeps all rows, including duplicates.
It is faster than UNION because it skips the step of removing duplicates.
All SELECT queries combined with UNION ALL must have the same number of columns with compatible types.
UNION ALL does not guarantee the order of rows unless you use ORDER BY on the combined result.
Choosing between UNION and UNION ALL depends on whether duplicates are meaningful for your data task.