0
0
Pandasdata~5 mins

Outlier detection with IQR in Pandas - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What does IQR stand for in data analysis?
IQR stands for Interquartile Range. It measures the middle 50% spread of the data between the 25th percentile (Q1) and the 75th percentile (Q3).
Click to reveal answer
beginner
How do you calculate the IQR from a dataset?
IQR = Q3 - Q1, where Q1 is the 25th percentile and Q3 is the 75th percentile of the data.
Click to reveal answer
intermediate
Why is IQR useful for detecting outliers?
IQR helps find outliers by identifying data points that fall below Q1 - 1.5*IQR or above Q3 + 1.5*IQR, which are unusually far from the middle 50% of data.
Click to reveal answer
beginner
Show the pandas code to calculate Q1, Q3, and IQR for a DataFrame column named 'data'.
Q1 = df['data'].quantile(0.25) Q3 = df['data'].quantile(0.75) IQR = Q3 - Q1
Click to reveal answer
intermediate
How do you filter out outliers using IQR in pandas?
Use the condition: keep rows where values are >= Q1 - 1.5*IQR and <= Q3 + 1.5*IQR. Example: filtered_df = df[(df['data'] >= Q1 - 1.5*IQR) & (df['data'] <= Q3 + 1.5*IQR)]
Click to reveal answer
What does the IQR measure in a dataset?
AThe range between minimum and maximum values
BThe average value
CThe total number of data points
DThe middle 50% spread of the data
Which formula identifies outliers using IQR?
AValues < Q1 - 1.5*IQR or > Q3 + 1.5*IQR
BValues < Q1 + 1.5*IQR or > Q3 - 1.5*IQR
CValues < mean - 2*std or > mean + 2*std
DValues < minimum or > maximum
In pandas, how do you get the 25th percentile of a column 'data'?
Adf['data'].mean()
Bdf['data'].quantile(0.25)
Cdf['data'].median()
Ddf['data'].max()
What is the purpose of filtering data using IQR in pandas?
ATo remove missing values
BTo select only the largest values
CTo remove outliers
DTo sort the data
If Q1 = 10 and Q3 = 20, what is the IQR?
A10
B30
C15
D20
Explain how to detect outliers using the IQR method in pandas.
Think about the steps from calculating quartiles to filtering data.
You got /5 concepts.
    Why is the IQR method preferred over using min and max values for outlier detection?
    Consider how extreme values affect range versus IQR.
    You got /4 concepts.