0
0
PythonProgramBeginner · 2 min read

Python Program to Find Word Frequency in File

You can find word frequency in a file by reading the file, splitting the text into words, and using a dictionary to count each word's occurrences, like word_counts = {}; for word in words: word_counts[word] = word_counts.get(word, 0) + 1.
📋

Examples

InputFile content: 'apple apple orange'
Output{"apple": 2, "orange": 1}
InputFile content: 'hello world hello python'
Output{"hello": 2, "world": 1, "python": 1}
InputFile content: '' (empty file)
Output{}
🧠

How to Think About It

To find word frequency in a file, first open and read the file content as text. Then split the text into individual words by spaces or punctuation. Next, create a dictionary to keep track of how many times each word appears. For each word, increase its count in the dictionary. Finally, return or print this dictionary showing each word and its frequency.
📐

Algorithm

1
Open the file in read mode.
2
Read the entire content of the file as a string.
3
Split the string into words using whitespace.
4
Create an empty dictionary to store word counts.
5
For each word, convert it to lowercase and update its count in the dictionary.
6
Print or return the dictionary with word frequencies.
💻

Code

python
def word_frequency(filename):
    with open(filename, 'r') as file:
        text = file.read()
    words = text.lower().split()
    freq = {}
    for word in words:
        freq[word] = freq.get(word, 0) + 1
    return freq

# Example usage
filename = 'sample.txt'
print(word_frequency(filename))
Output
{"apple": 2, "orange": 1}
🔍

Dry Run

Let's trace the file content 'apple apple orange' through the code

1

Read file content

text = 'apple apple orange'

2

Split and lowercase words

words = ['apple', 'apple', 'orange']

3

Initialize empty dictionary

freq = {}

4

Count first word 'apple'

freq = {'apple': 1}

5

Count second word 'apple'

freq = {'apple': 2}

6

Count third word 'orange'

freq = {'apple': 2, 'orange': 1}

WordFrequency Dictionary
apple{"apple": 1}
apple{"apple": 2}
orange{"apple": 2, "orange": 1}
💡

Why This Works

Step 1: Reading the file

We open the file and read all its text into a string so we can process it.

Step 2: Splitting into words

We split the text by spaces to get each word separately and convert them to lowercase to count uniformly.

Step 3: Counting words

We use a dictionary where keys are words and values are counts, increasing the count each time we see the word.

🔄

Alternative Approaches

Using collections.Counter
python
from collections import Counter

def word_frequency(filename):
    with open(filename, 'r') as file:
        words = file.read().lower().split()
    return dict(Counter(words))

# Example usage
filename = 'sample.txt'
print(word_frequency(filename))
This method is shorter and uses a built-in class optimized for counting, improving readability and performance.
Using regular expressions to split words
python
import re

def word_frequency(filename):
    with open(filename, 'r') as file:
        text = file.read().lower()
    words = re.findall(r'\b\w+\b', text)
    freq = {}
    for word in words:
        freq[word] = freq.get(word, 0) + 1
    return freq

# Example usage
filename = 'sample.txt'
print(word_frequency(filename))
This approach handles punctuation better by extracting only word characters, giving more accurate counts.

Complexity: O(n) time, O(m) space

Time Complexity

The program reads the file once and processes each word once, so time grows linearly with the number of words (n).

Space Complexity

The dictionary stores counts for each unique word (m), so space depends on the number of unique words.

Which Approach is Fastest?

Using collections.Counter is generally fastest and most readable, while manual counting offers more control.

ApproachTimeSpaceBest For
Manual dictionary countingO(n)O(m)Learning basics and custom logic
collections.CounterO(n)O(m)Fast, clean, and concise code
Regex splitting + manual countO(n)O(m)Handling punctuation accurately
💡
Always convert words to lowercase before counting to avoid treating the same word differently due to case.
⚠️
Beginners often forget to handle case sensitivity, causing 'Word' and 'word' to be counted separately.