0
0
Power-biHow-ToBeginner ยท 4 min read

Remove Duplicates in Power Query in Power BI: Simple Steps

In Power Query within Power BI, you remove duplicates by selecting the columns you want to check and then clicking Remove Duplicates from the Home tab. This action keeps only unique rows based on the selected columns and deletes repeated ones.
๐Ÿ“

Syntax

To remove duplicates in Power Query, use the Table.Distinct function. It has this syntax:

  • Table.Distinct(table as table, optional columns as list, optional equationCriteria as any) as table

Explanation:

  • table: The table you want to remove duplicates from.
  • columns (optional): List of columns to check for duplicates. If omitted, all columns are checked.
  • equationCriteria (optional): How to compare values, usually omitted.
powerquery
Table.Distinct(Source, {"Column1", "Column2"})
๐Ÿ’ป

Example

This example shows how to remove duplicate rows based on the "Name" column in a table.

powerquery
let
    Source = Table.FromRecords({
        [Name="Alice", Age=30],
        [Name="Bob", Age=25],
        [Name="Alice", Age=30],
        [Name="Charlie", Age=35]
    }),
    RemovedDuplicates = Table.Distinct(Source, {"Name"})
in
    RemovedDuplicates
Output
[ {"Name": "Alice", "Age": 30}, {"Name": "Bob", "Age": 25}, {"Name": "Charlie", "Age": 35} ]
โš ๏ธ

Common Pitfalls

Common mistakes when removing duplicates in Power Query include:

  • Not selecting the correct columns, which can remove rows you want to keep.
  • Assuming Remove Duplicates affects the original data source (it only changes the query output).
  • Forgetting that Remove Duplicates is case-sensitive by default.

Always double-check which columns are selected before removing duplicates.

powerquery
/* Wrong: Removing duplicates without specifying columns may remove rows unexpectedly */
let
    Source = Table.FromRecords({
        [Name="Alice", Age=30],
        [Name="alice", Age=30],
        [Name="Bob", Age=25]
    }),
    RemovedDuplicates = Table.Distinct(Source)
in
    RemovedDuplicates

/* Right: Specify columns and handle case sensitivity if needed */
let
    Source = Table.FromRecords({
        [Name="Alice", Age=30],
        [Name="alice", Age=30],
        [Name="Bob", Age=25]
    }),
    LowerCaseNames = Table.TransformColumns(Source, {"Name", Text.Lower}),
    RemovedDuplicates = Table.Distinct(LowerCaseNames, {"Name"})
in
    RemovedDuplicates
๐Ÿ“Š

Quick Reference

ActionDescriptionPower Query Step
Remove duplicates from all columnsKeeps unique rows based on all columnsTable.Distinct(Source)
Remove duplicates based on specific columnsKeeps unique rows based on selected columnsTable.Distinct(Source, {"Column1", "Column2"})
Make text comparison case-insensitiveConvert text to lower or upper case before removing duplicatesTable.TransformColumns(Source, {"Column", Text.Lower})
Remove duplicates via UISelect columns, then click Remove Duplicates in Home tabNo code, use Power Query interface
โœ…

Key Takeaways

Use Table.Distinct to remove duplicates in Power Query by specifying columns to check.
Removing duplicates affects only the query output, not the original data source.
Be careful with case sensitivity; convert text to lower or upper case if needed.
You can remove duplicates easily using the Power Query UI by selecting columns and clicking Remove Duplicates.
Always verify which columns are selected to avoid removing important data.