PowershellHow-ToBeginner · 2 min read

PowerShell Script to Find Duplicate Lines in File

Use Get-Content filename.txt | Group-Object | Where-Object { $_.Count -gt 1 } | Select-Object -ExpandProperty Name to find duplicate lines in a file with PowerShell.

📋

Examples

Inputapple banana apple orange banana

Outputapple banana

Inputline1 line2 line3 line4

Output

Inputtest test test unique unique

Outputtest unique

🧠

How to Think About It

To find duplicate lines, read all lines from the file, group identical lines together, then select only those groups where the count of lines is more than one. This way, you identify which lines appear multiple times.

📐

Algorithm

Read all lines from the file.

Group lines by their content.

Filter groups to keep only those with more than one occurrence.

Extract the line content from these groups.

Output the duplicate lines.

💻

Code

powershell

Get-Content 'input.txt' | Group-Object | Where-Object { $_.Count -gt 1 } | Select-Object -ExpandProperty Name

Output

apple banana

🔍

Dry Run

Let's trace the input lines 'apple', 'banana', 'apple', 'orange', 'banana' through the code.

Read lines

Lines read: apple, banana, apple, orange, banana

Group lines

Groups: apple (2), banana (2), orange (1)

Filter duplicates

Duplicates: apple, banana

Line	Count
apple	2
banana	2
orange	1

💡

Why This Works

Step 1: Read file lines

The Get-Content command reads each line from the file as a string.

Step 2: Group identical lines

The Group-Object command groups lines that have the same text together.

Step 3: Filter groups with duplicates

Using Where-Object { $_.Count -gt 1 } keeps only groups where the line appears more than once.

Step 4: Extract duplicate lines

Finally, Select-Object -ExpandProperty Name outputs just the duplicate line text.

🔄

Alternative Approaches

Using a hashtable to count lines

powershell

$counts = @{}; Get-Content 'input.txt' | ForEach-Object { if ($counts.ContainsKey($_)) { $counts[$_]++ } else { $counts[$_] = 1 } }; $counts.GetEnumerator() | Where-Object { $_.Value -gt 1 } | ForEach-Object { $_.Key }

This method manually counts lines and can be more flexible but requires more code.

Using Select-String with regex

powershell

Select-String -Path 'input.txt' -Pattern '^(.+)$' -AllMatches | Group-Object -Property Line | Where-Object { $_.Count -gt 1 } | Select-Object -ExpandProperty Name

This approach uses regex matching but is more complex and less direct.

⚡

Complexity: O(n) time, O(n) space

Time Complexity

Reading all lines and grouping them requires one pass through the data, so time complexity is O(n) where n is the number of lines.

Space Complexity

Grouping stores all unique lines and their counts, so space complexity is O(n) in the worst case when all lines are unique.

Which Approach is Fastest?

Using Group-Object is efficient and concise. Manual counting with a hashtable is similar in speed but more verbose.

Approach	Time	Space	Best For
Group-Object with Where-Object	O(n)	O(n)	Simple and readable duplicate detection
Hashtable counting	O(n)	O(n)	More control, flexible processing
Select-String with regex	O(n)	O(n)	Complex patterns, less direct

💡

Use Group-Object with Where-Object to quickly find duplicates in PowerShell.

⚠️

Beginners often forget to filter groups with count greater than one, showing all lines instead of duplicates only.