0
0
Pandasdata~10 mins

Outlier detection with IQR in Pandas - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Outlier detection with IQR
Calculate Q1 (25th percentile)
Calculate Q3 (75th percentile)
Compute IQR = Q3 - Q1
Determine lower bound = Q1 - 1.5*IQR
Determine upper bound = Q3 + 1.5*IQR
Flag data points < lower bound or > upper bound as outliers
Done
Step-by-step, we find the middle 50% range of data, then mark points far outside this range as outliers.
Execution Sample
Pandas
import pandas as pd

data = pd.Series([10, 12, 14, 15, 18, 19, 20, 100])
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
IQR = Q3 - Q1
outliers = data[(data < Q1 - 1.5*IQR) | (data > Q3 + 1.5*IQR)]
This code finds outliers in a list of numbers using the IQR method.
Execution Table
StepActionValue/CalculationResult
1Calculate Q1 (25th percentile)data.quantile(0.25)13.25
2Calculate Q3 (75th percentile)data.quantile(0.75)19.25
3Compute IQRQ3 - Q16.0
4Calculate lower boundQ1 - 1.5 * IQR13.25 - 9.0 = 4.25
5Calculate upper boundQ3 + 1.5 * IQR19.25 + 9.0 = 28.25
6Identify outliersValues < 4.25 or > 28.25Only 100 is > 28.25, so outlier
7Output outliersdata[(data < 4.25) | (data > 28.25)]100
8EndNo more stepsOutlier detection complete
💡 All data points checked; only 100 lies outside bounds, marked as outlier.
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4After Step 5After Step 6Final
data[10,12,14,15,18,19,20,100][10,12,14,15,18,19,20,100][10,12,14,15,18,19,20,100][10,12,14,15,18,19,20,100][10,12,14,15,18,19,20,100][10,12,14,15,18,19,20,100][10,12,14,15,18,19,20,100][10,12,14,15,18,19,20,100]
Q1N/A13.2513.2513.2513.2513.2513.2513.25
Q3N/AN/A19.2519.2519.2519.2519.2519.25
IQRN/AN/AN/A6.06.06.06.06.0
lower_boundN/AN/AN/AN/A4.254.254.254.25
upper_boundN/AN/AN/AN/AN/A28.2528.2528.25
outliersN/AN/AN/AN/AN/AN/A[100][100]
Key Moments - 3 Insights
Why do we multiply IQR by 1.5 to find bounds?
Multiplying IQR by 1.5 sets a range beyond the middle 50% to catch unusually far points. See execution_table steps 4 and 5 where bounds are calculated using 1.5*IQR.
Why is 100 an outlier but 20 is not, even though 20 is far from most data?
Because 20 lies within the upper bound (28.25), it is not flagged. Only values beyond 28.25 are outliers, as shown in execution_table step 6.
Can outliers be below the lower bound?
Yes, any value less than lower bound (4.25 here) is also an outlier. In this data, no values are below 4.25, so no low outliers appear.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 3. What is the IQR value?
A13.25
B19.25
C6.0
D28.25
💡 Hint
Check the 'Result' column at step 3 in execution_table.
At which step do we identify the outlier values?
AStep 6
BStep 4
CStep 2
DStep 8
💡 Hint
Look for the step where values outside bounds are found in execution_table.
If the data had a value 3 instead of 10, what would happen to the lower bound?
ALower bound would increase
BLower bound would decrease
CLower bound stays the same
DLower bound becomes zero
💡 Hint
Think about how Q1 changes if data has smaller values; check variable_tracker for Q1 and lower_bound.
Concept Snapshot
Outlier detection with IQR:
1. Calculate Q1 (25th percentile) and Q3 (75th percentile).
2. Compute IQR = Q3 - Q1.
3. Define lower bound = Q1 - 1.5*IQR and upper bound = Q3 + 1.5*IQR.
4. Data points outside these bounds are outliers.
Use pandas quantile() and boolean indexing to find outliers.
Full Transcript
Outlier detection with IQR involves finding the middle 50% range of data between Q1 and Q3. We calculate the interquartile range (IQR) by subtracting Q1 from Q3. Then, we set lower and upper bounds by subtracting and adding 1.5 times the IQR to Q1 and Q3 respectively. Any data points outside these bounds are considered outliers. In the example, the data series has values including 100, which lies beyond the upper bound and is flagged as an outlier. This method helps identify unusually high or low values compared to the bulk of data.