Pandasdata~10 mins

Outlier detection with IQR in Pandas - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Outlier detection with IQR

Calculate Q1 (25th percentile)

↓

Calculate Q3 (75th percentile)

↓

Compute IQR = Q3 - Q1

↓

Determine lower bound = Q1 - 1.5*IQR

↓

Determine upper bound = Q3 + 1.5*IQR

↓

Flag data points < lower bound or > upper bound as outliers

↓

Done

Step-by-step, we find the middle 50% range of data, then mark points far outside this range as outliers.

Execution Sample

Pandas

import pandas as pd

data = pd.Series([10, 12, 14, 15, 18, 19, 20, 100])
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
IQR = Q3 - Q1
outliers = data[(data < Q1 - 1.5*IQR) | (data > Q3 + 1.5*IQR)]

This code finds outliers in a list of numbers using the IQR method.

Execution Table

Step	Action	Value/Calculation	Result
1	Calculate Q1 (25th percentile)	data.quantile(0.25)	13.25
2	Calculate Q3 (75th percentile)	data.quantile(0.75)	19.25
3	Compute IQR	Q3 - Q1	6.0
4	Calculate lower bound	Q1 - 1.5 * IQR	13.25 - 9.0 = 4.25
5	Calculate upper bound	Q3 + 1.5 * IQR	19.25 + 9.0 = 28.25
6	Identify outliers	Values < 4.25 or > 28.25	Only 100 is > 28.25, so outlier
7	Output outliers	data[(data < 4.25) \| (data > 28.25)]	100
8	End	No more steps	Outlier detection complete

💡 All data points checked; only 100 lies outside bounds, marked as outlier.

Variable Tracker

Variable	Start	After Step 1	After Step 2	After Step 3	After Step 4	After Step 5	After Step 6	Final
data	[10,12,14,15,18,19,20,100]	[10,12,14,15,18,19,20,100]	[10,12,14,15,18,19,20,100]	[10,12,14,15,18,19,20,100]	[10,12,14,15,18,19,20,100]	[10,12,14,15,18,19,20,100]	[10,12,14,15,18,19,20,100]	[10,12,14,15,18,19,20,100]
Q1	N/A	13.25	13.25	13.25	13.25	13.25	13.25	13.25
Q3	N/A	N/A	19.25	19.25	19.25	19.25	19.25	19.25
IQR	N/A	N/A	N/A	6.0	6.0	6.0	6.0	6.0
lower_bound	N/A	N/A	N/A	N/A	4.25	4.25	4.25	4.25
upper_bound	N/A	N/A	N/A	N/A	N/A	28.25	28.25	28.25
outliers	N/A	N/A	N/A	N/A	N/A	N/A	[100]	[100]

Key Moments - 3 Insights

Why do we multiply IQR by 1.5 to find bounds?

Why is 100 an outlier but 20 is not, even though 20 is far from most data?

Can outliers be below the lower bound?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table at step 3. What is the IQR value?

A13.25

B19.25

C6.0

D28.25

Concept Snapshot

Outlier detection with IQR:
1. Calculate Q1 (25th percentile) and Q3 (75th percentile).
2. Compute IQR = Q3 - Q1.
3. Define lower bound = Q1 - 1.5*IQR and upper bound = Q3 + 1.5*IQR.
4. Data points outside these bounds are outliers.
Use pandas quantile() and boolean indexing to find outliers.

Full Transcript

Outlier detection with IQR involves finding the middle 50% range of data between Q1 and Q3. We calculate the interquartile range (IQR) by subtracting Q1 from Q3. Then, we set lower and upper bounds by subtracting and adding 1.5 times the IQR to Q1 and Q3 respectively. Any data points outside these bounds are considered outliers. In the example, the data series has values including 100, which lies beyond the upper bound and is flagged as an outlier. This method helps identify unusually high or low values compared to the bulk of data.