value_counts() for distributions in Data Analysis Python - Time & Space Complexity
We want to understand how long it takes to count unique values in data using value_counts().
How does the time change when the data size grows?
Analyze the time complexity of the following code snippet.
import pandas as pd
data = pd.Series(["apple", "banana", "apple", "orange", "banana", "banana"])
counts = data.value_counts()
print(counts)
This code counts how many times each unique fruit appears in the list.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning each item in the data once to count occurrences.
- How many times: Exactly once for each of the n items in the data.
As the data size grows, the counting work grows roughly the same amount.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 checks |
| 100 | About 100 checks |
| 1000 | About 1000 checks |
Pattern observation: The work grows in a straight line with the input size.
Time Complexity: O(n)
This means the time to count values grows directly with the number of items.
[X] Wrong: "Counting unique values takes longer if there are many unique items than if there are few."
[OK] Correct: The main work is scanning all items once, so the total time depends mostly on data size, not how many unique values there are.
Knowing how counting operations scale helps you explain performance clearly and shows you understand data processing basics.
"What if the data was already sorted? How would that affect the time complexity of value_counts()?"