Python Program to Find Word Frequency in File
word_counts = {}; for word in words: word_counts[word] = word_counts.get(word, 0) + 1.Examples
How to Think About It
Algorithm
Code
def word_frequency(filename): with open(filename, 'r') as file: text = file.read() words = text.lower().split() freq = {} for word in words: freq[word] = freq.get(word, 0) + 1 return freq # Example usage filename = 'sample.txt' print(word_frequency(filename))
Dry Run
Let's trace the file content 'apple apple orange' through the code
Read file content
text = 'apple apple orange'
Split and lowercase words
words = ['apple', 'apple', 'orange']
Initialize empty dictionary
freq = {}
Count first word 'apple'
freq = {'apple': 1}
Count second word 'apple'
freq = {'apple': 2}
Count third word 'orange'
freq = {'apple': 2, 'orange': 1}
| Word | Frequency Dictionary |
|---|---|
| apple | {"apple": 1} |
| apple | {"apple": 2} |
| orange | {"apple": 2, "orange": 1} |
Why This Works
Step 1: Reading the file
We open the file and read all its text into a string so we can process it.
Step 2: Splitting into words
We split the text by spaces to get each word separately and convert them to lowercase to count uniformly.
Step 3: Counting words
We use a dictionary where keys are words and values are counts, increasing the count each time we see the word.
Alternative Approaches
from collections import Counter def word_frequency(filename): with open(filename, 'r') as file: words = file.read().lower().split() return dict(Counter(words)) # Example usage filename = 'sample.txt' print(word_frequency(filename))
import re def word_frequency(filename): with open(filename, 'r') as file: text = file.read().lower() words = re.findall(r'\b\w+\b', text) freq = {} for word in words: freq[word] = freq.get(word, 0) + 1 return freq # Example usage filename = 'sample.txt' print(word_frequency(filename))
Complexity: O(n) time, O(m) space
Time Complexity
The program reads the file once and processes each word once, so time grows linearly with the number of words (n).
Space Complexity
The dictionary stores counts for each unique word (m), so space depends on the number of unique words.
Which Approach is Fastest?
Using collections.Counter is generally fastest and most readable, while manual counting offers more control.
| Approach | Time | Space | Best For |
|---|---|---|---|
| Manual dictionary counting | O(n) | O(m) | Learning basics and custom logic |
| collections.Counter | O(n) | O(m) | Fast, clean, and concise code |
| Regex splitting + manual count | O(n) | O(m) | Handling punctuation accurately |