Word frequency analysis in Data Analysis Python - Time & Space Complexity
We want to know how the time to count words grows as the text gets longer.
How does the work change when we have more words to analyze?
Analyze the time complexity of the following code snippet.
text = "this is a sample text with several words this is a test"
words = text.split()
word_counts = {}
for word in words:
if word in word_counts:
word_counts[word] += 1
else:
word_counts[word] = 1
This code counts how many times each word appears in the text.
- Primary operation: Looping through each word in the list.
- How many times: Once for every word in the text.
As the number of words grows, the loop runs more times, increasing work linearly.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 checks and updates |
| 100 | About 100 checks and updates |
| 1000 | About 1000 checks and updates |
Pattern observation: The work grows directly with the number of words.
Time Complexity: O(n)
This means the time to count words grows in direct proportion to the number of words.
[X] Wrong: "Checking if a word is in the dictionary takes time proportional to the number of words counted so far."
[OK] Correct: Dictionary lookups are very fast and do not grow with the number of words counted, so each check is almost constant time.
Understanding how counting words scales helps you explain how your code handles larger texts efficiently.
"What if we used a list instead of a dictionary to store counts? How would the time complexity change?"