0
0
PowershellHow-ToBeginner · 2 min read

PowerShell Script to Find Duplicate Lines in File

Use Get-Content filename.txt | Group-Object | Where-Object { $_.Count -gt 1 } | Select-Object -ExpandProperty Name to find duplicate lines in a file with PowerShell.
📋

Examples

Inputapple banana apple orange banana
Outputapple banana
Inputline1 line2 line3 line4
Output
Inputtest test test unique unique
Outputtest unique
🧠

How to Think About It

To find duplicate lines, read all lines from the file, group identical lines together, then select only those groups where the count of lines is more than one. This way, you identify which lines appear multiple times.
📐

Algorithm

1
Read all lines from the file.
2
Group lines by their content.
3
Filter groups to keep only those with more than one occurrence.
4
Extract the line content from these groups.
5
Output the duplicate lines.
💻

Code

powershell
Get-Content 'input.txt' | Group-Object | Where-Object { $_.Count -gt 1 } | Select-Object -ExpandProperty Name
Output
apple banana
🔍

Dry Run

Let's trace the input lines 'apple', 'banana', 'apple', 'orange', 'banana' through the code.

1

Read lines

Lines read: apple, banana, apple, orange, banana

2

Group lines

Groups: apple (2), banana (2), orange (1)

3

Filter duplicates

Duplicates: apple, banana

LineCount
apple2
banana2
orange1
💡

Why This Works

Step 1: Read file lines

The Get-Content command reads each line from the file as a string.

Step 2: Group identical lines

The Group-Object command groups lines that have the same text together.

Step 3: Filter groups with duplicates

Using Where-Object { $_.Count -gt 1 } keeps only groups where the line appears more than once.

Step 4: Extract duplicate lines

Finally, Select-Object -ExpandProperty Name outputs just the duplicate line text.

🔄

Alternative Approaches

Using a hashtable to count lines
powershell
$counts = @{}; Get-Content 'input.txt' | ForEach-Object { if ($counts.ContainsKey($_)) { $counts[$_]++ } else { $counts[$_] = 1 } }; $counts.GetEnumerator() | Where-Object { $_.Value -gt 1 } | ForEach-Object { $_.Key }
This method manually counts lines and can be more flexible but requires more code.
Using Select-String with regex
powershell
Select-String -Path 'input.txt' -Pattern '^(.+)$' -AllMatches | Group-Object -Property Line | Where-Object { $_.Count -gt 1 } | Select-Object -ExpandProperty Name
This approach uses regex matching but is more complex and less direct.

Complexity: O(n) time, O(n) space

Time Complexity

Reading all lines and grouping them requires one pass through the data, so time complexity is O(n) where n is the number of lines.

Space Complexity

Grouping stores all unique lines and their counts, so space complexity is O(n) in the worst case when all lines are unique.

Which Approach is Fastest?

Using Group-Object is efficient and concise. Manual counting with a hashtable is similar in speed but more verbose.

ApproachTimeSpaceBest For
Group-Object with Where-ObjectO(n)O(n)Simple and readable duplicate detection
Hashtable countingO(n)O(n)More control, flexible processing
Select-String with regexO(n)O(n)Complex patterns, less direct
💡
Use Group-Object with Where-Object to quickly find duplicates in PowerShell.
⚠️
Beginners often forget to filter groups with count greater than one, showing all lines instead of duplicates only.