Challenge - 5 Problems
IQR Outlier Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this IQR outlier detection code?
Given the DataFrame
df below, what does the outliers variable contain after running the code?Pandas
import pandas as pd df = pd.DataFrame({'values': [10, 12, 14, 15, 18, 19, 20, 100]}) Q1 = df['values'].quantile(0.25) Q3 = df['values'].quantile(0.75) IQR = Q3 - Q1 outliers = df[(df['values'] < Q1 - 1.5 * IQR) | (df['values'] > Q3 + 1.5 * IQR)] print(outliers)
Attempts:
2 left
💡 Hint
Calculate Q1 and Q3, then find the IQR. Check which values fall outside the range Q1 - 1.5*IQR to Q3 + 1.5*IQR.
✗ Incorrect
The IQR is Q3 - Q1 = 19 - 12 = 7. The lower bound is 12 - 1.5*7 = 1.5 and the upper bound is 19 + 1.5*7 = 29.5. Only the value 100 is above 29.5, so it is the only outlier.
❓ data_output
intermediate1:30remaining
How many outliers are detected by this IQR method?
Using the DataFrame
df and the IQR method below, how many rows are identified as outliers?Pandas
import pandas as pd df = pd.DataFrame({'scores': [55, 60, 65, 70, 75, 80, 85, 90, 95, 200]}) Q1 = df['scores'].quantile(0.25) Q3 = df['scores'].quantile(0.75) IQR = Q3 - Q1 outliers = df[(df['scores'] < Q1 - 1.5 * IQR) | (df['scores'] > Q3 + 1.5 * IQR)] print(len(outliers))
Attempts:
2 left
💡 Hint
Calculate the IQR and check which values fall outside the bounds.
✗ Incorrect
Q1 is 67.5, Q3 is 87.5, so IQR is 20. Lower bound is 67.5 - 1.5*20 = 37.5, upper bound is 87.5 + 1.5*20 = 117.5. Only 200 is outside this range, so 1 outlier.
❓ visualization
advanced2:30remaining
Which boxplot correctly shows the outliers detected by IQR?
You run this code to detect outliers and plot a boxplot. Which option shows the correct boxplot visualization?
Pandas
import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame({'data': [5, 7, 8, 9, 10, 12, 15, 100]}) plt.boxplot(df['data']) plt.show()
Attempts:
2 left
💡 Hint
Outliers appear as points outside the whiskers. The value 100 is much larger than others.
✗ Incorrect
The boxplot shows one outlier point at 100 above the upper whisker. The lower whisker includes 5, so no outlier below.
🔧 Debug
advanced1:30remaining
What error does this IQR outlier detection code raise?
Identify the error raised by this code snippet:
Pandas
import pandas as pd df = pd.DataFrame({'vals': [1, 2, 3, 4, 5]}) Q1 = df['vals'].quantile(0.25) Q3 = df['vals'].quantile(0.75) IQR = Q3 - Q1 outliers = df[(df['vals'] < Q1 - 1.5 * IQR) | (df['vals'] > Q3 + 1.5 * IQR)] print(outlier)
Attempts:
2 left
💡 Hint
Check variable names carefully in the print statement.
✗ Incorrect
The variable is named 'outliers' but the code tries to print 'outlier', which is undefined.
🚀 Application
expert3:00remaining
Which option correctly filters out outliers using IQR in a DataFrame with multiple columns?
You have a DataFrame
df with columns A and B. You want to remove rows where A or B have outliers based on IQR. Which code correctly does this?Pandas
import pandas as pd df = pd.DataFrame({'A': [10, 12, 14, 100, 15], 'B': [20, 22, 23, 24, 200]})
Attempts:
2 left
💡 Hint
Use logical AND to keep rows where both columns are within the IQR bounds.
✗ Incorrect
Option A correctly applies the IQR filter for each column using AND conditions to keep rows where neither column is an outlier. Option A uses negation but is correct logically; however, it is verbose. Option A uses DataFrame-wide comparison which returns NaNs and does not filter properly. Option A uses OR with >= and <= which includes all rows and is incorrect.