0
0
Pandasdata~5 mins

nlargest() and nsmallest() in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: nlargest() and nsmallest()
O(n)
Understanding Time Complexity

When we use pandas functions like nlargest() and nsmallest(), we want to know how their speed changes as the data grows.

We ask: How much longer does it take to find the top or bottom values when the data size increases?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

data = pd.DataFrame({
    'values': range(1000000, 0, -1)
})

# Find the 5 largest values
largest = data['values'].nlargest(5)

# Find the 5 smallest values
smallest = data['values'].nsmallest(5)

This code finds the top 5 largest and smallest numbers from a million rows.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Scanning through all data values once to compare and select top or bottom n items.
  • How many times: Exactly one full pass over the data (1 million items in this case).
How Execution Grows With Input

As the data size grows, the time to find the largest or smallest values grows roughly in a straight line.

Input Size (n)Approx. Operations
10About 10 comparisons
100About 100 comparisons
1000About 1000 comparisons

Pattern observation: Doubling the data roughly doubles the work needed.

Final Time Complexity

Time Complexity: O(n)

This means the time to find the largest or smallest values grows linearly with the number of data points.

Common Mistake

[X] Wrong: "Finding the top 5 values means sorting the entire data, so it takes a long time like O(n log n)."

[OK] Correct: pandas uses efficient methods that do not sort all data but only keep track of the top or bottom n items, so it only needs one pass through the data.

Interview Connect

Understanding how nlargest() and nsmallest() work helps you explain efficient data selection in real tasks, showing you know how to handle big data smartly.

Self-Check

"What if we asked for the top half of the data instead of just 5 items? How would the time complexity change?"