0
0
Pandasdata~5 mins

Outlier detection with IQR in Pandas

Choose your learning style9 modes available
Introduction

Outliers are data points that are very different from others. Detecting them helps us understand and clean data better.

When checking for unusual sales numbers in a store's monthly data.
When analyzing students' test scores to find very high or low results.
When cleaning sensor data that might have errors or spikes.
When preparing data for machine learning to avoid misleading results.
Syntax
Pandas
Q1 = df['column'].quantile(0.25)
Q3 = df['column'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df['column'] < lower_bound) | (df['column'] > upper_bound)]

Q1 is the 25th percentile, Q3 is the 75th percentile.

Outliers are values outside 1.5 times the IQR below Q1 or above Q3.

Examples
Detect outliers in the 'age' column of a DataFrame.
Pandas
Q1 = df['age'].quantile(0.25)
Q3 = df['age'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df['age'] < lower_bound) | (df['age'] > upper_bound)]
Find outliers in 'salary' data to spot unusually low or high salaries.
Pandas
Q1 = df['salary'].quantile(0.25)
Q3 = df['salary'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df['salary'] < lower_bound) | (df['salary'] > upper_bound)]
Sample Program

This code finds outliers in the 'score' column. The value 200 is much higher than others and will be detected as an outlier.

Pandas
import pandas as pd

data = {'score': [55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200]}
df = pd.DataFrame(data)

Q1 = df['score'].quantile(0.25)
Q3 = df['score'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

outliers = df[(df['score'] < lower_bound) | (df['score'] > upper_bound)]
print(outliers)
OutputSuccess
Important Notes

Outlier detection with IQR works well for data without extreme skew.

Adjusting the 1.5 multiplier changes sensitivity to outliers.

Summary

IQR helps find values far from the middle 50% of data.

Outliers are outside 1.5 times the IQR below Q1 or above Q3.

Detecting outliers helps clean and understand data better.