PythonProgramBeginner · 2 min read

Python Program to Find Word Frequency in File

You can find word frequency in a file by reading the file, splitting the text into words, and using a dictionary to count each word's occurrences, like word_counts = {}; for word in words: word_counts[word] = word_counts.get(word, 0) + 1.

📋

Examples

InputFile content: 'apple apple orange'

Output{"apple": 2, "orange": 1}

InputFile content: 'hello world hello python'

Output{"hello": 2, "world": 1, "python": 1}

InputFile content: '' (empty file)

Output{}

🧠

How to Think About It

To find word frequency in a file, first open and read the file content as text. Then split the text into individual words by spaces or punctuation. Next, create a dictionary to keep track of how many times each word appears. For each word, increase its count in the dictionary. Finally, return or print this dictionary showing each word and its frequency.

📐

Algorithm

Open the file in read mode.

Read the entire content of the file as a string.

Split the string into words using whitespace.

Create an empty dictionary to store word counts.

For each word, convert it to lowercase and update its count in the dictionary.

Print or return the dictionary with word frequencies.

💻

Code

python

def word_frequency(filename):
    with open(filename, 'r') as file:
        text = file.read()
    words = text.lower().split()
    freq = {}
    for word in words:
        freq[word] = freq.get(word, 0) + 1
    return freq

# Example usage
filename = 'sample.txt'
print(word_frequency(filename))

Output

{"apple": 2, "orange": 1}

🔍

Dry Run

Let's trace the file content 'apple apple orange' through the code

Read file content

text = 'apple apple orange'

Split and lowercase words

words = ['apple', 'apple', 'orange']

Initialize empty dictionary

freq = {}

Count first word 'apple'

freq = {'apple': 1}

Count second word 'apple'

freq = {'apple': 2}

Count third word 'orange'

freq = {'apple': 2, 'orange': 1}

Word	Frequency Dictionary
apple	{"apple": 1}
apple	{"apple": 2}
orange	{"apple": 2, "orange": 1}

💡

Why This Works

Step 1: Reading the file

We open the file and read all its text into a string so we can process it.

Step 2: Splitting into words

We split the text by spaces to get each word separately and convert them to lowercase to count uniformly.

Step 3: Counting words

We use a dictionary where keys are words and values are counts, increasing the count each time we see the word.

🔄

Alternative Approaches

Using collections.Counter

python

from collections import Counter

def word_frequency(filename):
    with open(filename, 'r') as file:
        words = file.read().lower().split()
    return dict(Counter(words))

# Example usage
filename = 'sample.txt'
print(word_frequency(filename))

This method is shorter and uses a built-in class optimized for counting, improving readability and performance.

Using regular expressions to split words

python

import re

def word_frequency(filename):
    with open(filename, 'r') as file:
        text = file.read().lower()
    words = re.findall(r'\b\w+\b', text)
    freq = {}
    for word in words:
        freq[word] = freq.get(word, 0) + 1
    return freq

# Example usage
filename = 'sample.txt'
print(word_frequency(filename))

This approach handles punctuation better by extracting only word characters, giving more accurate counts.

⚡

Complexity: O(n) time, O(m) space

Time Complexity

The program reads the file once and processes each word once, so time grows linearly with the number of words (n).

Space Complexity

The dictionary stores counts for each unique word (m), so space depends on the number of unique words.

Which Approach is Fastest?

Using collections.Counter is generally fastest and most readable, while manual counting offers more control.

Approach	Time	Space	Best For
Manual dictionary counting	O(n)	O(m)	Learning basics and custom logic
collections.Counter	O(n)	O(m)	Fast, clean, and concise code
Regex splitting + manual count	O(n)	O(m)	Handling punctuation accurately

💡

Always convert words to lowercase before counting to avoid treating the same word differently due to case.

⚠️

Beginners often forget to handle case sensitivity, causing 'Word' and 'word' to be counted separately.