How to Use IQR Method for Outliers in Python
Use the
IQR (Interquartile Range) method in Python by calculating the first quartile (Q1) and third quartile (Q3), then find the IQR as Q3 - Q1. Outliers are values below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR.Syntax
The IQR method involves these steps:
- Calculate Q1 (25th percentile) and Q3 (75th percentile) of your data.
- Compute IQR = Q3 - Q1.
- Define lower bound = Q1 - 1.5 * IQR.
- Define upper bound = Q3 + 1.5 * IQR.
- Identify outliers as values outside these bounds.
python
import numpy as np Q1 = np.percentile(data, 25) Q3 = np.percentile(data, 75) IQR = Q3 - Q1 lower_bound = Q1 - 1.5 * IQR upper_bound = Q3 + 1.5 * IQR outliers = [x for x in data if x < lower_bound or x > upper_bound]
Example
This example shows how to find outliers in a list of numbers using the IQR method in Python.
python
import numpy as np data = [10, 12, 14, 15, 18, 19, 20, 22, 23, 24, 100] Q1 = np.percentile(data, 25) Q3 = np.percentile(data, 75) IQR = Q3 - Q1 lower_bound = Q1 - 1.5 * IQR upper_bound = Q3 + 1.5 * IQR outliers = [x for x in data if x < lower_bound or x > upper_bound] print("Q1:", Q1) print("Q3:", Q3) print("IQR:", IQR) print("Lower bound:", lower_bound) print("Upper bound:", upper_bound) print("Outliers:", outliers)
Output
Q1: 14.25
Q3: 23.0
IQR: 8.75
Lower bound: 1.125
Upper bound: 36.125
Outliers: [100]
Common Pitfalls
Common mistakes when using the IQR method include:
- Not using the correct percentiles (Q1 = 25th, Q3 = 75th).
- Forgetting to multiply IQR by 1.5 when calculating bounds.
- Applying the method on non-numeric or unsorted data.
- Misinterpreting outliers as errors instead of potential important data points.
python
import numpy as np data = [10, 12, 14, 15, 18, 19, 20, 22, 23, 24, 100] # Wrong: Using 50th percentile instead of 25th and 75th Q1_wrong = np.percentile(data, 50) Q3_wrong = np.percentile(data, 50) IQR_wrong = Q3_wrong - Q1_wrong # Correct way Q1 = np.percentile(data, 25) Q3 = np.percentile(data, 75) IQR = Q3 - Q1
Quick Reference
Remember these key points for the IQR method:
- Q1: 25th percentile
- Q3: 75th percentile
- IQR: Q3 - Q1
- Lower bound: Q1 - 1.5 * IQR
- Upper bound: Q3 + 1.5 * IQR
- Outliers: Values outside lower and upper bounds
Key Takeaways
Calculate Q1 and Q3 using the 25th and 75th percentiles of your data.
Compute IQR as the difference between Q3 and Q1.
Outliers are data points below Q1 - 1.5*IQR or above Q3 + 1.5*IQR.
Use numpy.percentile for easy percentile calculations in Python.
Check your data type and distribution before applying the IQR method.