nunique() for cardinality in Data Analysis Python - Time & Space Complexity
We want to understand how the time to count unique values grows as the data size increases.
How does the work change when we have more rows to check for unique items?
Analyze the time complexity of the following code snippet.
import pandas as pd
data = pd.Series([1, 2, 2, 3, 4, 4, 4, 5])
unique_count = data.nunique()
print(unique_count)
This code counts how many different values are in the data series.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Checking each element to see if it is unique.
- How many times: Once for each item in the data series.
As the number of items grows, the work grows roughly in the same way.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 checks |
| 100 | About 100 checks |
| 1000 | About 1000 checks |
Pattern observation: The number of operations grows directly with the number of items.
Time Complexity: O(n)
This means the time to count unique values grows linearly with the number of items.
[X] Wrong: "Counting unique values is instant no matter how big the data is."
[OK] Correct: The function must look at each item at least once, so bigger data means more work.
Understanding how counting unique items scales helps you explain data processing speed clearly and confidently.
"What if we used a sorted list before counting unique values? How would the time complexity change?"