Overview - UNION ALL with duplicates

What is it?

UNION ALL is a SQL command that combines the results of two or more SELECT queries into one result set. Unlike UNION, UNION ALL does not remove duplicate rows; it keeps all rows from each query, including duplicates. This means if the same row appears in both queries, it will appear twice in the combined result. It is useful when you want to see every record without filtering duplicates.

Why it matters

Without UNION ALL, you might lose important repeated data because UNION removes duplicates. This can cause incorrect results in reports or data analysis where duplicates matter. UNION ALL solves this by preserving every row, making it essential for tasks like merging logs, combining datasets with overlapping entries, or counting all occurrences. Without it, you would have to write more complex queries or process data outside the database.

Where it fits

Before learning UNION ALL, you should understand basic SELECT queries and the concept of combining results with UNION. After mastering UNION ALL, you can explore advanced set operations like INTERSECT and EXCEPT, and learn how to optimize queries that handle large datasets with duplicates.

Mental Model

Core Idea

UNION ALL stacks query results on top of each other, keeping every row exactly as it appears, including duplicates.

Think of it like...

Imagine two stacks of papers from different departments. UNION ALL is like putting one stack on top of the other without checking for repeated pages, so all pages stay in the final pile.

┌─────────────┐   ┌─────────────┐
│ Query 1     │   │ Query 2     │
│ Row A       │   │ Row B       │
│ Row B       │   │ Row C       │
└─────┬───────┘   └─────┬───────┘
      │                 │
      └───────┬─────────┘
              │
       UNION ALL combines
              │
      ┌───────────────────┐
      │ Row A             │
      │ Row B             │
      │ Row B             │
      │ Row C             │
      └───────────────────┘

Build-Up - 7 Steps

1

FoundationBasic SELECT Queries

Concept: Understanding how to retrieve data from a single table using SELECT.

A SELECT query asks the database to give you rows from a table. For example, SELECT name FROM employees; returns all employee names.

Result

A list of names from the employees table.

Knowing how to write simple SELECT queries is the foundation for combining multiple queries later.

2

FoundationCombining Results with UNION

3

IntermediateIntroducing UNION ALL

4

IntermediatePerformance Differences Between UNION and UNION ALL

5

IntermediateUse Cases for UNION ALL with Duplicates

6

AdvancedHandling Column Compatibility in UNION ALL

7

ExpertUNION ALL and Query Optimization Internals

Under the Hood

When you run UNION ALL, the database executes each SELECT query independently. It then appends the results one after another without checking for duplicates. This means no extra sorting or filtering happens after fetching the rows. The combined result is streamed directly to the client or next operation.

Why designed this way?

UNION ALL was designed to provide a fast way to combine datasets when duplicates are meaningful or when performance is critical. Removing duplicates requires extra work like sorting or hashing, which slows queries. By skipping this, UNION ALL offers a simpler, faster alternative for many use cases.

┌───────────────┐   ┌───────────────┐
│ SELECT Query 1│   │ SELECT Query 2│
│ (Executes)   │   │ (Executes)   │
└───────┬───────┘   └───────┬───────┘
        │                   │
        └───────┬───────────┘
                │
        ┌───────▼───────────┐
        │ Append Results     │
        │ (No duplicate     │
        │  checking)        │
        └───────┬───────────┘
                │
        ┌───────▼───────────┐
        │ Final Result Set   │
        └───────────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does UNION ALL remove duplicate rows automatically? Commit to yes or no.

Common Belief:UNION ALL removes duplicates just like UNION does.

Tap to reveal reality

Quick: Is UNION ALL always faster than UNION? Commit to yes or no.

Common Belief:UNION ALL is always faster regardless of data size or query complexity.

Tap to reveal reality

Quick: Can UNION ALL combine queries with different numbers of columns? Commit to yes or no.

Common Belief:UNION ALL can combine any SELECT queries regardless of column count or types.

Tap to reveal reality

Quick: Does UNION ALL guarantee the order of rows in the result? Commit to yes or no.

Common Belief:UNION ALL returns rows in the order they appear in the first query, then the second, and so on.

Tap to reveal reality

Expert Zone

1

UNION ALL can be combined with window functions to analyze duplicates without removing them.

2

Some databases optimize UNION ALL by parallelizing the execution of each SELECT query for faster results.

3

Using UNION ALL with large datasets requires careful indexing and query planning to avoid performance bottlenecks.

When NOT to use

Avoid UNION ALL when you need a unique set of rows; use UNION instead. For complex deduplication, consider using DISTINCT or analytic functions. If you need to merge data with different columns, use JOINs or subqueries instead.

Production Patterns

In production, UNION ALL is often used to merge logs from multiple servers, combine partitioned tables, or append new data batches. It is also common in ETL pipelines where duplicates are meaningful and must be preserved for auditing.

Connections

Set Theory

UNION ALL corresponds to the multiset union operation, which allows duplicates, unlike the standard set union.

Understanding multisets in math helps grasp why UNION ALL keeps duplicates and how it differs from pure set union.

Data Streaming

UNION ALL's append-only behavior is similar to streaming data pipelines that concatenate data chunks without filtering.

Knowing streaming concepts clarifies why UNION ALL is efficient and suitable for real-time data merging.

Inventory Counting

UNION ALL is like counting every item in stock from multiple warehouses, including duplicates, to get total quantity.

Relating to inventory helps understand why duplicates matter and must be preserved in some data operations.

Common Pitfalls

#1Expecting UNION ALL to remove duplicates automatically.

Wrong approach:SELECT name FROM employees UNION ALL SELECT name FROM managers; -- expecting unique names

Correct approach:SELECT name FROM employees UNION SELECT name FROM managers; -- removes duplicates

Root cause:Confusing UNION ALL with UNION leads to unexpected repeated rows.

#2Combining queries with different column counts using UNION ALL.

Wrong approach:SELECT name FROM employees UNION ALL SELECT name, department FROM managers;

Correct approach:SELECT name FROM employees UNION ALL SELECT name FROM managers;

Root cause:Not matching column numbers and types causes syntax errors.

#3Assuming UNION ALL preserves row order without ORDER BY.

Wrong approach:SELECT name FROM employees UNION ALL SELECT name FROM managers; -- expecting ordered output

Correct approach:SELECT name FROM (SELECT name FROM employees UNION ALL SELECT name FROM managers) AS combined ORDER BY name;

Root cause:Misunderstanding that SQL result order is undefined without explicit sorting.

Key Takeaways

UNION ALL combines results from multiple queries and keeps all rows, including duplicates.

It is faster than UNION because it skips the step of removing duplicates.

All SELECT queries combined with UNION ALL must have the same number of columns with compatible types.

UNION ALL does not guarantee the order of rows unless you use ORDER BY on the combined result.

Choosing between UNION and UNION ALL depends on whether duplicates are meaningful for your data task.