Bash Script to Find Word Frequency in File
tr -cs '[:alnum:]' '\n' | sort | uniq -c | sort -nr in Bash to find and display word frequency in a file.Examples
How to Think About It
Algorithm
Code
#!/bin/bash if [ $# -ne 1 ]; then echo "Usage: $0 filename" exit 1 fi tr -cs '[:alnum:]' '\n' < "$1" | sort | uniq -c | sort -nr | awk '{print $1, $2}'
Dry Run
Let's trace the input 'hello world hello' through the code
Replace non-alphanumeric with new lines
Input: 'hello world hello' -> Output: 'hello\nworld\nhello'
Sort words
Words: ['hello', 'hello', 'world'] sorted -> ['hello', 'hello', 'world']
Count unique words
'hello' appears 2 times, 'world' appears 1 time
| Word | Count |
|---|---|
| hello | 2 |
| world | 1 |
Why This Works
Step 1: Splitting words
The tr -cs '[:alnum:]' '\n' command replaces all characters except letters and numbers with new lines, isolating each word on its own line.
Step 2: Sorting words
Sorting groups identical words together so uniq -c can count consecutive duplicates.
Step 3: Counting and sorting frequency
uniq -c counts occurrences, and sort -nr sorts the counts in descending order to show the most frequent words first.
Alternative Approaches
awk '{for(i=1;i<=NF;i++) freq[$i]++} END {for(word in freq) print freq[word], word}' filename | sort -nrgrep -oE '\w+' filename | sort | uniq -c | sort -nrComplexity: O(n log n) time, O(n) space
Time Complexity
Sorting the words dominates time at O(n log n), where n is the number of words.
Space Complexity
Extra space is needed to store all words and counts, so O(n) space is used.
Which Approach is Fastest?
The tr | sort | uniq pipeline is efficient and simple; awk can be faster for very large files but may be less precise in splitting.
| Approach | Time | Space | Best For |
|---|---|---|---|
| tr + sort + uniq | O(n log n) | O(n) | General use, reliable word splitting |
| awk counting | O(n) | O(n) | Direct counting, simpler code |
| grep + sort + uniq | O(n log n) | O(n) | Quick extraction if grep supports options |
tr -cs '[:alnum:]' '\n' to split text into words cleanly in Bash.