0
0
Bash-scriptingHow-ToBeginner · 2 min read

Bash Script to Remove Duplicate Characters from String

Use echo "$string" | fold -w1 | awk '!seen[$0]++' | tr -d '\n' in Bash to remove duplicate characters from a string while preserving order.
📋

Examples

Inputhello
Outputhelo
Inputbanana
Outputban
Inputaaaaa
Outputa
🧠

How to Think About It

To remove duplicate characters, think of checking each character one by one and keeping only the first time it appears. You can split the string into single characters, remember which ones you saw, and skip repeats.
📐

Algorithm

1
Get the input string.
2
Split the string into individual characters.
3
Keep track of characters already seen.
4
For each character, if it is not seen before, keep it; otherwise, skip it.
5
Join the kept characters back into a string.
6
Return the resulting string without duplicates.
💻

Code

bash
#!/bin/bash

input="banana"

# Remove duplicate characters while preserving order
result=$(echo "$input" | fold -w1 | awk '!seen[$0]++' | tr -d '\n')

# Print the result
printf "%s\n" "$result"
Output
ban
🔍

Dry Run

Let's trace the input 'banana' through the code

1

Split string into characters

b a n a n a

2

Filter unique characters with awk

b (not seen before, keep) a (not seen before, keep) n (not seen before, keep) a (seen before, skip) n (seen before, skip) a (seen before, skip)

3

Join characters back

ban

CharacterSeen Before?Action
bNoKeep
aNoKeep
nNoKeep
aYesSkip
nYesSkip
aYesSkip
💡

Why This Works

Step 1: Splitting the string

Using fold -w1 splits the string into one character per line so we can process each character separately.

Step 2: Filtering duplicates

The awk '!seen[$0]++' command keeps only the first occurrence of each character by tracking seen characters in an array.

Step 3: Rejoining characters

Finally, tr -d '\n' removes newlines to join the characters back into a single string without duplicates.

🔄

Alternative Approaches

Using Bash associative array
bash
#!/bin/bash
input="banana"
declare -A seen
result=""
for (( i=0; i<${#input}; i++ )); do
  c=${input:i:1}
  if [[ -z ${seen[$c]} ]]; then
    result+=$c
    seen[$c]=1
  fi
done
printf "%s\n" "$result"
This method uses pure Bash without external commands but requires Bash 4+ for associative arrays.
Using grep and awk
bash
#!/bin/bash
input="banana"
result=$(echo "$input" | grep -o . | awk '!a[$0]++' | tr -d '\n')
printf "%s\n" "$result"
Similar to the main method but uses <code>grep -o .</code> to split characters instead of <code>fold</code>.

Complexity: O(n) time, O(n) space

Time Complexity

The script processes each character once, so time grows linearly with string length.

Space Complexity

It stores seen characters in memory, which can grow up to the number of unique characters.

Which Approach is Fastest?

The Bash associative array method avoids external commands and can be faster for large strings but requires Bash 4+. The pipeline with awk is simpler and portable.

ApproachTimeSpaceBest For
awk pipelineO(n)O(n)Simple scripts, portability
Bash associative arrayO(n)O(n)Performance, no external commands
grep and awkO(n)O(n)Alternative splitting method
💡
Use awk '!seen[$0]++' to easily filter unique lines or characters in Bash pipelines.
⚠️
Forgetting to remove newlines after filtering duplicates causes output to be split across lines instead of a single string.