Remove Duplicates in Power Query in Power BI: Simple Steps
In Power Query within Power BI, you remove duplicates by selecting the columns you want to check and then clicking
Remove Duplicates from the Home tab. This action keeps only unique rows based on the selected columns and deletes repeated ones.Syntax
To remove duplicates in Power Query, use the Table.Distinct function. It has this syntax:
Table.Distinct(table as table, optional columns as list, optional equationCriteria as any) as table
Explanation:
table: The table you want to remove duplicates from.columns(optional): List of columns to check for duplicates. If omitted, all columns are checked.equationCriteria(optional): How to compare values, usually omitted.
powerquery
Table.Distinct(Source, {"Column1", "Column2"})Example
This example shows how to remove duplicate rows based on the "Name" column in a table.
powerquery
let Source = Table.FromRecords({ [Name="Alice", Age=30], [Name="Bob", Age=25], [Name="Alice", Age=30], [Name="Charlie", Age=35] }), RemovedDuplicates = Table.Distinct(Source, {"Name"}) in RemovedDuplicates
Output
[
{"Name": "Alice", "Age": 30},
{"Name": "Bob", "Age": 25},
{"Name": "Charlie", "Age": 35}
]
Common Pitfalls
Common mistakes when removing duplicates in Power Query include:
- Not selecting the correct columns, which can remove rows you want to keep.
- Assuming
Remove Duplicatesaffects the original data source (it only changes the query output). - Forgetting that
Remove Duplicatesis case-sensitive by default.
Always double-check which columns are selected before removing duplicates.
powerquery
/* Wrong: Removing duplicates without specifying columns may remove rows unexpectedly */ let Source = Table.FromRecords({ [Name="Alice", Age=30], [Name="alice", Age=30], [Name="Bob", Age=25] }), RemovedDuplicates = Table.Distinct(Source) in RemovedDuplicates /* Right: Specify columns and handle case sensitivity if needed */ let Source = Table.FromRecords({ [Name="Alice", Age=30], [Name="alice", Age=30], [Name="Bob", Age=25] }), LowerCaseNames = Table.TransformColumns(Source, {"Name", Text.Lower}), RemovedDuplicates = Table.Distinct(LowerCaseNames, {"Name"}) in RemovedDuplicates
Quick Reference
| Action | Description | Power Query Step |
|---|---|---|
| Remove duplicates from all columns | Keeps unique rows based on all columns | Table.Distinct(Source) |
| Remove duplicates based on specific columns | Keeps unique rows based on selected columns | Table.Distinct(Source, {"Column1", "Column2"}) |
| Make text comparison case-insensitive | Convert text to lower or upper case before removing duplicates | Table.TransformColumns(Source, {"Column", Text.Lower}) |
| Remove duplicates via UI | Select columns, then click Remove Duplicates in Home tab | No code, use Power Query interface |
Key Takeaways
Use Table.Distinct to remove duplicates in Power Query by specifying columns to check.
Removing duplicates affects only the query output, not the original data source.
Be careful with case sensitivity; convert text to lower or upper case if needed.
You can remove duplicates easily using the Power Query UI by selecting columns and clicking Remove Duplicates.
Always verify which columns are selected to avoid removing important data.