Pandasdata~5 mins

Outlier detection with IQR in Pandas - Time & Space Complexity

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Time Complexity: Outlier detection with IQR

O(n log n)

Understanding Time Complexity

We want to understand how the time needed to find outliers using IQR changes as the data grows.

How does the work increase when we have more data points?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

# Assume df is a DataFrame with a numeric column 'values'
Q1 = df['values'].quantile(0.25)
Q3 = df['values'].quantile(0.75)
IQR = Q3 - Q1
outliers = df[(df['values'] < (Q1 - 1.5 * IQR)) | (df['values'] > (Q3 + 1.5 * IQR))]

This code calculates the IQR and finds rows where values are outliers based on that range.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

Primary operation: Pandas quantile internally sorts the 'values' column (O(n log n)) to compute order statistics, plus O(n) filtering.
How many times: Two quantile calls (each O(n log n)) and one filtering pass (O(n)); dominated by sorting.

How Execution Grows With Input

As the number of data points grows, the time grows as O(n log n) due to sorting in quantiles.

Input Size (n)	Approx. Operations
10	About 70 (2 sorts ~60 + 10 filter)
100	About 1,400
1000	About 21,000

Pattern observation: Doubling n more than doubles the work due to the log n factor.

Final Time Complexity

Time Complexity: O(n log n)

Dominated by sorting within pandas quantile calls.

Common Mistake

[X] Wrong: "It's O(n) because we just scan the data a few times."

[OK] Correct: quantile requires sorting to find precise percentiles, costing O(n log n).

Interview Connect

Understanding how data size affects outlier detection helps you explain your approach clearly and shows you think about efficiency in real data tasks.

Self-Check

"Pandas quantile uses sorting (O(n log n)). What if we used a linear-time selection algorithm (like quickselect) for exact quantiles? How would complexity change?"