nlargest() and nsmallest() in Pandas - Time & Space Complexity
When we use pandas functions like nlargest() and nsmallest(), we want to know how their speed changes as the data grows.
We ask: How much longer does it take to find the top or bottom values when the data size increases?
Analyze the time complexity of the following code snippet.
import pandas as pd
data = pd.DataFrame({
'values': range(1000000, 0, -1)
})
# Find the 5 largest values
largest = data['values'].nlargest(5)
# Find the 5 smallest values
smallest = data['values'].nsmallest(5)
This code finds the top 5 largest and smallest numbers from a million rows.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning through all data values once to compare and select top or bottom n items.
- How many times: Exactly one full pass over the data (1 million items in this case).
As the data size grows, the time to find the largest or smallest values grows roughly in a straight line.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 comparisons |
| 100 | About 100 comparisons |
| 1000 | About 1000 comparisons |
Pattern observation: Doubling the data roughly doubles the work needed.
Time Complexity: O(n)
This means the time to find the largest or smallest values grows linearly with the number of data points.
[X] Wrong: "Finding the top 5 values means sorting the entire data, so it takes a long time like O(n log n)."
[OK] Correct: pandas uses efficient methods that do not sort all data but only keep track of the top or bottom n items, so it only needs one pass through the data.
Understanding how nlargest() and nsmallest() work helps you explain efficient data selection in real tasks, showing you know how to handle big data smartly.
"What if we asked for the top half of the data instead of just 5 items? How would the time complexity change?"