value_counts() for frequency in Pandas - Time & Space Complexity
We want to understand how the time needed to count values grows as the data gets bigger.
How does pandas count the frequency of items in a column as the number of rows increases?
Analyze the time complexity of the following code snippet.
import pandas as pd
data = pd.Series(['apple', 'banana', 'apple', 'orange', 'banana', 'banana'])
freq = data.value_counts()
This code counts how many times each unique fruit appears in the list.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: pandas goes through each item in the list once to count occurrences.
- How many times: It looks at every element exactly one time.
As the list gets longer, the counting work grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 checks |
| 100 | About 100 checks |
| 1000 | About 1000 checks |
Pattern observation: Doubling the data roughly doubles the work.
Time Complexity: O(n)
This means the time to count frequencies grows in a straight line with the number of items.
[X] Wrong: "Counting values takes the same time no matter how many items there are."
[OK] Correct: The function must look at each item once, so more items mean more work.
Knowing how counting scales helps you explain performance when working with big data tables.
"What if the data had many repeated values versus mostly unique values? How would that affect the time complexity?"