Bash Script to Remove Duplicate Characters from String
echo "$string" | fold -w1 | awk '!seen[$0]++' | tr -d '\n' in Bash to remove duplicate characters from a string while preserving order.Examples
How to Think About It
Algorithm
Code
#!/bin/bash input="banana" # Remove duplicate characters while preserving order result=$(echo "$input" | fold -w1 | awk '!seen[$0]++' | tr -d '\n') # Print the result printf "%s\n" "$result"
Dry Run
Let's trace the input 'banana' through the code
Split string into characters
b a n a n a
Filter unique characters with awk
b (not seen before, keep) a (not seen before, keep) n (not seen before, keep) a (seen before, skip) n (seen before, skip) a (seen before, skip)
Join characters back
ban
| Character | Seen Before? | Action |
|---|---|---|
| b | No | Keep |
| a | No | Keep |
| n | No | Keep |
| a | Yes | Skip |
| n | Yes | Skip |
| a | Yes | Skip |
Why This Works
Step 1: Splitting the string
Using fold -w1 splits the string into one character per line so we can process each character separately.
Step 2: Filtering duplicates
The awk '!seen[$0]++' command keeps only the first occurrence of each character by tracking seen characters in an array.
Step 3: Rejoining characters
Finally, tr -d '\n' removes newlines to join the characters back into a single string without duplicates.
Alternative Approaches
#!/bin/bash input="banana" declare -A seen result="" for (( i=0; i<${#input}; i++ )); do c=${input:i:1} if [[ -z ${seen[$c]} ]]; then result+=$c seen[$c]=1 fi done printf "%s\n" "$result"
#!/bin/bash input="banana" result=$(echo "$input" | grep -o . | awk '!a[$0]++' | tr -d '\n') printf "%s\n" "$result"
Complexity: O(n) time, O(n) space
Time Complexity
The script processes each character once, so time grows linearly with string length.
Space Complexity
It stores seen characters in memory, which can grow up to the number of unique characters.
Which Approach is Fastest?
The Bash associative array method avoids external commands and can be faster for large strings but requires Bash 4+. The pipeline with awk is simpler and portable.
| Approach | Time | Space | Best For |
|---|---|---|---|
| awk pipeline | O(n) | O(n) | Simple scripts, portability |
| Bash associative array | O(n) | O(n) | Performance, no external commands |
| grep and awk | O(n) | O(n) | Alternative splitting method |
awk '!seen[$0]++' to easily filter unique lines or characters in Bash pipelines.