Challenge - 5 Problems
Box Plot Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of a simple box plot data summary
What is the output of the following code snippet that creates a box plot summary using pandas?
Pandas
import pandas as pd import numpy as np data = pd.DataFrame({'values': np.array([1, 2, 2, 3, 4, 5, 6, 7, 8, 9])}) summary = data['values'].describe() summary
Attempts:
2 left
💡 Hint
Look carefully at the quartile values and mean calculated by pandas describe() method.
✗ Incorrect
The pandas describe() method calculates count, mean, std, min, 25%, 50%, 75%, and max. The 25% quartile is 2.75, median (50%) is 4.5, and mean is 4.7 for this data.
❓ data_output
intermediate2:00remaining
Number of outliers detected in a box plot
Given the following data, how many outliers will be detected using the IQR method in a box plot?
Pandas
import pandas as pd import numpy as np data = pd.Series([10, 12, 12, 13, 12, 11, 14, 15, 100, 101, 102]) Q1 = data.quantile(0.25) Q3 = data.quantile(0.75) IQR = Q3 - Q1 outliers = data[(data < (Q1 - 1.5 * IQR)) | (data > (Q3 + 1.5 * IQR))] outliers.count()
Attempts:
2 left
💡 Hint
Calculate Q1, Q3, then find values outside 1.5 times IQR range.
✗ Incorrect
The outliers are values greater than Q3 + 1.5*IQR or less than Q1 - 1.5*IQR. Here, 100, 101, and 102 are outliers but only 2 exceed the upper bound based on calculation.
❓ visualization
advanced2:30remaining
Interpreting box plot visualization with multiple categories
You have this code to create box plots for two groups. Which statement correctly describes the visualization output?
Pandas
import pandas as pd import matplotlib.pyplot as plt import numpy as np np.random.seed(0) data = pd.DataFrame({'group': ['A']*50 + ['B']*50, 'value': np.concatenate([np.random.normal(0, 1, 50), np.random.normal(1, 1.5, 50)])}) data.boxplot(by='group', column='value') plt.show()
Attempts:
2 left
💡 Hint
Look at the mean and standard deviation used to generate each group.
✗ Incorrect
Group B data is generated with mean 1 and std 1.5, so its median is higher and spread wider than group A's mean 0 and std 1.
🧠 Conceptual
advanced1:30remaining
Understanding whiskers in box plots
In a box plot created by pandas, what do the whiskers represent by default?
Attempts:
2 left
💡 Hint
Think about how box plots identify outliers.
✗ Incorrect
Whiskers extend to the most extreme points within 1.5 times the interquartile range (IQR) from the quartiles. Points beyond are outliers.
🔧 Debug
expert2:00remaining
Identify the error in box plot code with missing data
What error will this code raise when trying to plot a box plot with missing values?
Pandas
import pandas as pd import numpy as np import matplotlib.pyplot as plt data = pd.Series([1, 2, np.nan, 4, 5]) data.plot.box() plt.show()
Attempts:
2 left
💡 Hint
Check how pandas handles NaN values in plotting functions.
✗ Incorrect
Pandas automatically ignores NaN values when plotting box plots, so no error occurs and the plot is generated correctly.