0
0
PostgreSQLquery~15 mins

UNION and UNION ALL in PostgreSQL - Deep Dive

Choose your learning style9 modes available
Overview - UNION and UNION ALL
What is it?
UNION and UNION ALL are commands in SQL that combine results from two or more queries into a single result set. UNION removes duplicate rows, showing only unique records, while UNION ALL includes all rows, even duplicates. They help you merge data from different tables or queries easily.
Why it matters
Without UNION and UNION ALL, combining data from multiple sources would require complex manual merging or multiple queries, making data analysis slow and error-prone. These commands simplify data integration, saving time and reducing mistakes when working with related datasets.
Where it fits
Before learning UNION and UNION ALL, you should understand basic SELECT queries and how to retrieve data from tables. After mastering these, you can explore JOINs for combining data horizontally and advanced set operations like INTERSECT and EXCEPT.
Mental Model
Core Idea
UNION and UNION ALL stack query results vertically, with UNION removing duplicates and UNION ALL keeping all rows.
Think of it like...
Imagine you have two lists of names from different friends. UNION is like writing all names on one paper but crossing out any repeated names so each appears once. UNION ALL is like just putting both lists together without crossing anything out, so duplicates stay.
┌───────────────┐   ┌───────────────┐
│ Query 1 Result│   │ Query 2 Result│
└──────┬────────┘   └──────┬────────┘
       │                   │
       └───────┬───────────┘
               │
        ┌──────▼────────┐
        │ UNION / UNION ALL │
        └──────┬────────┘
               │
       ┌───────▼───────────┐
       │ Combined Result Set│
       └───────────────────┘
Build-Up - 7 Steps
1
FoundationBasic SELECT Queries Review
🤔
Concept: Understanding how to retrieve data from a single table using SELECT.
A SELECT query fetches rows from a table. For example, SELECT name FROM employees; returns all employee names.
Result
A list of names from the employees table.
Knowing how to write simple SELECT queries is essential before combining multiple query results.
2
FoundationCombining Two Queries with UNION
🤔
Concept: UNION merges results from two queries and removes duplicate rows.
If you have two queries like SELECT city FROM customers; and SELECT city FROM suppliers;, UNION combines their results but shows each city only once.
Result
A list of unique cities from both customers and suppliers.
Understanding that UNION removes duplicates helps you get a clean combined list without repeats.
3
IntermediateUsing UNION ALL to Keep Duplicates
🤔Before reading on: do you think UNION ALL removes duplicates like UNION, or keeps all rows including duplicates? Commit to your answer.
Concept: UNION ALL combines results from queries but keeps all rows, including duplicates.
Using UNION ALL with the same city queries returns all cities from both tables, even if some appear multiple times.
Result
A list of cities including duplicates if they appear in both tables.
Knowing when to use UNION ALL is important when you want to preserve all data, such as counting total occurrences.
4
IntermediateRequirements for UNION and UNION ALL
🤔Before reading on: do you think the queries combined by UNION must have the same number of columns and types, or can they differ? Commit to your answer.
Concept: Queries combined by UNION or UNION ALL must have the same number of columns with compatible data types.
If one query selects two columns, the other must also select two columns with matching types, like both selecting city names as text.
Result
Queries run successfully only if column counts and types match; otherwise, an error occurs.
Understanding this prevents errors and helps you design queries that can be combined.
5
IntermediateOrdering Results After UNION
🤔
Concept: You can sort the combined result set using ORDER BY after the last query.
For example, (SELECT city FROM customers UNION SELECT city FROM suppliers) ORDER BY city ASC; sorts the combined unique cities alphabetically.
Result
A sorted list of unique cities from both tables.
Knowing that ORDER BY applies to the whole combined result helps you control output presentation.
6
AdvancedPerformance Differences Between UNION and UNION ALL
🤔Before reading on: do you think UNION is faster, slower, or the same speed as UNION ALL? Commit to your answer.
Concept: UNION requires extra work to remove duplicates, making it slower than UNION ALL which just stacks results.
UNION performs a distinct operation internally to eliminate duplicates, while UNION ALL simply concatenates results without checks.
Result
UNION ALL queries generally run faster, especially on large datasets.
Understanding performance differences helps you choose the right command for efficiency.
7
ExpertSubtle Effects of NULLs in UNION Operations
🤔Before reading on: do you think NULL values are treated as equal or different when UNION removes duplicates? Commit to your answer.
Concept: In UNION, NULLs in the same column are treated as equal, so duplicates with NULLs are removed.
If two rows differ only by having NULL in a column, UNION considers them duplicates and keeps one. UNION ALL keeps both.
Result
UNION removes rows with NULL duplicates; UNION ALL keeps all rows including NULL duplicates.
Knowing how NULLs affect duplicate removal prevents unexpected missing rows in results.
Under the Hood
When you run UNION, the database executes each query separately, then combines the results into a temporary set. It then scans this set to remove duplicate rows by comparing all columns. UNION ALL skips this duplicate removal step and directly concatenates the results. Internally, this involves sorting or hashing to detect duplicates for UNION.
Why designed this way?
UNION was designed to provide a simple way to merge datasets while ensuring uniqueness, which is often needed in reporting and analysis. UNION ALL was added to allow faster merging when duplicates are acceptable or desired. This separation balances correctness and performance.
┌─────────────┐   ┌─────────────┐
│ Query 1     │   │ Query 2     │
└─────┬───────┘   └─────┬───────┘
      │                 │
      └───────┬─────────┘
              │
       ┌──────▼───────┐
       │ Combine Rows │
       └──────┬───────┘
              │
      ┌───────▼────────┐
      │ Remove Duplicates│  <-- Only for UNION
      └───────┬────────┘
              │
       ┌──────▼───────┐
       │ Final Result │
       └──────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does UNION ALL remove duplicate rows or keep them all? Commit to yes or no.
Common Belief:UNION ALL removes duplicates just like UNION.
Tap to reveal reality
Reality:UNION ALL keeps all rows including duplicates; it does not remove any duplicates.
Why it matters:Using UNION ALL when you expect unique rows can cause unexpected repeated data and incorrect analysis.
Quick: Can you use ORDER BY inside each query before UNION? Commit to yes or no.
Common Belief:You can put ORDER BY in each individual query combined by UNION to sort partial results.
Tap to reveal reality
Reality:ORDER BY applies only once after all UNION queries are combined, not inside each query unless wrapped in subqueries.
Why it matters:Misplacing ORDER BY can cause syntax errors or unexpected sorting behavior.
Quick: Do queries combined by UNION need to have the same number of columns? Commit to yes or no.
Common Belief:Queries combined by UNION can have different numbers of columns.
Tap to reveal reality
Reality:All queries combined by UNION or UNION ALL must have the same number of columns with compatible types.
Why it matters:Ignoring this causes errors and confusion when writing combined queries.
Quick: Are NULL values treated as different or the same when removing duplicates in UNION? Commit to your answer.
Common Belief:NULL values are treated as different, so duplicates with NULLs are kept.
Tap to reveal reality
Reality:UNION treats NULLs as equal, so duplicates with NULLs are removed.
Why it matters:This affects which rows appear in the final result and can cause missing data if misunderstood.
Expert Zone
1
UNION's duplicate removal uses sorting or hashing internally, and the choice affects performance depending on data size and indexing.
2
The order of rows in UNION or UNION ALL results is not guaranteed unless ORDER BY is used after the last query.
3
When combining large datasets, using UNION ALL and then filtering duplicates manually can sometimes be more efficient than UNION.
When NOT to use
Avoid UNION when you want to preserve duplicates for accurate counts or logs; use UNION ALL instead. Also, if you need to combine columns side-by-side, use JOINs rather than UNION. For filtering common or different rows, consider INTERSECT or EXCEPT.
Production Patterns
In real systems, UNION ALL is often used for log aggregation where duplicates matter, while UNION is used in reporting to get distinct lists. Queries are carefully designed to match column types and use ORDER BY after UNION for consistent output.
Connections
JOIN
Both combine data but JOIN merges columns horizontally, UNION stacks rows vertically.
Understanding UNION complements JOIN knowledge by showing two main ways to combine tables: stacking rows vs. adding columns.
Set Theory
UNION corresponds to the union operation in set theory, combining elements from two sets without duplicates.
Knowing set theory helps understand why UNION removes duplicates and how it relates to mathematical sets.
Data Deduplication in Storage Systems
UNION's duplicate removal is similar to deduplication techniques that remove repeated data to save space.
Recognizing this connection shows how database operations mirror broader data management strategies.
Common Pitfalls
#1Trying to combine queries with different numbers of columns.
Wrong approach:SELECT name FROM customers UNION SELECT name, city FROM suppliers;
Correct approach:SELECT name, city FROM customers UNION SELECT name, city FROM suppliers;
Root cause:Misunderstanding that UNION requires matching column counts and types.
#2Using ORDER BY inside each query before UNION instead of after.
Wrong approach:SELECT city FROM customers ORDER BY city UNION SELECT city FROM suppliers ORDER BY city;
Correct approach:(SELECT city FROM customers UNION SELECT city FROM suppliers) ORDER BY city;
Root cause:Not knowing that ORDER BY applies to the combined result, not individual queries.
#3Expecting UNION ALL to remove duplicates.
Wrong approach:SELECT city FROM customers UNION ALL SELECT city FROM suppliers;
Correct approach:SELECT city FROM customers UNION SELECT city FROM suppliers;
Root cause:Confusing UNION ALL with UNION regarding duplicate removal.
Key Takeaways
UNION and UNION ALL combine results from multiple queries by stacking rows vertically.
UNION removes duplicate rows to show unique results, while UNION ALL keeps all rows including duplicates.
All queries combined by UNION or UNION ALL must have the same number of columns with compatible data types.
ORDER BY applies only once after all UNION queries to sort the final combined result.
UNION is slower than UNION ALL because it removes duplicates, so choose based on whether duplicates matter.