PowerShell Script to Find Duplicate Lines in File
Get-Content filename.txt | Group-Object | Where-Object { $_.Count -gt 1 } | Select-Object -ExpandProperty Name to find duplicate lines in a file with PowerShell.Examples
How to Think About It
Algorithm
Code
Get-Content 'input.txt' | Group-Object | Where-Object { $_.Count -gt 1 } | Select-Object -ExpandProperty Name
Dry Run
Let's trace the input lines 'apple', 'banana', 'apple', 'orange', 'banana' through the code.
Read lines
Lines read: apple, banana, apple, orange, banana
Group lines
Groups: apple (2), banana (2), orange (1)
Filter duplicates
Duplicates: apple, banana
| Line | Count |
|---|---|
| apple | 2 |
| banana | 2 |
| orange | 1 |
Why This Works
Step 1: Read file lines
The Get-Content command reads each line from the file as a string.
Step 2: Group identical lines
The Group-Object command groups lines that have the same text together.
Step 3: Filter groups with duplicates
Using Where-Object { $_.Count -gt 1 } keeps only groups where the line appears more than once.
Step 4: Extract duplicate lines
Finally, Select-Object -ExpandProperty Name outputs just the duplicate line text.
Alternative Approaches
$counts = @{}; Get-Content 'input.txt' | ForEach-Object { if ($counts.ContainsKey($_)) { $counts[$_]++ } else { $counts[$_] = 1 } }; $counts.GetEnumerator() | Where-Object { $_.Value -gt 1 } | ForEach-Object { $_.Key }Select-String -Path 'input.txt' -Pattern '^(.+)$' -AllMatches | Group-Object -Property Line | Where-Object { $_.Count -gt 1 } | Select-Object -ExpandProperty Name
Complexity: O(n) time, O(n) space
Time Complexity
Reading all lines and grouping them requires one pass through the data, so time complexity is O(n) where n is the number of lines.
Space Complexity
Grouping stores all unique lines and their counts, so space complexity is O(n) in the worst case when all lines are unique.
Which Approach is Fastest?
Using Group-Object is efficient and concise. Manual counting with a hashtable is similar in speed but more verbose.
| Approach | Time | Space | Best For |
|---|---|---|---|
| Group-Object with Where-Object | O(n) | O(n) | Simple and readable duplicate detection |
| Hashtable counting | O(n) | O(n) | More control, flexible processing |
| Select-String with regex | O(n) | O(n) | Complex patterns, less direct |
Group-Object with Where-Object to quickly find duplicates in PowerShell.