0
0
Pandasdata~20 mins

Box plots in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Box Plot Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of a simple box plot data summary
What is the output of the following code snippet that creates a box plot summary using pandas?
Pandas
import pandas as pd
import numpy as np

data = pd.DataFrame({'values': np.array([1, 2, 2, 3, 4, 5, 6, 7, 8, 9])})
summary = data['values'].describe()
summary
A{'count': 10.0, 'mean': 4.7, 'std': 2.738613, 'min': 1.0, '25%': 2.5, '50%': 4.5, '75%': 7.0, 'max': 9.0}
B{'count': 10.0, 'mean': 5.0, 'std': 3.0, 'min': 1.0, '25%': 3.0, '50%': 5.0, '75%': 7.0, 'max': 9.0}
C{'count': 10.0, 'mean': 4.7, 'std': 2.738613, 'min': 1.0, '25%': 2.75, '50%': 4.5, '75%': 7.0, 'max': 9.0}
D{'count': 10.0, 'mean': 4.7, 'std': 2.738613, 'min': 1.0, '25%': 2.75, '50%': 5.0, '75%': 7.0, 'max': 9.0}
Attempts:
2 left
💡 Hint
Look carefully at the quartile values and mean calculated by pandas describe() method.
data_output
intermediate
2:00remaining
Number of outliers detected in a box plot
Given the following data, how many outliers will be detected using the IQR method in a box plot?
Pandas
import pandas as pd
import numpy as np

data = pd.Series([10, 12, 12, 13, 12, 11, 14, 15, 100, 101, 102])
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
IQR = Q3 - Q1
outliers = data[(data < (Q1 - 1.5 * IQR)) | (data > (Q3 + 1.5 * IQR))]
outliers.count()
A2
B3
C1
D0
Attempts:
2 left
💡 Hint
Calculate Q1, Q3, then find values outside 1.5 times IQR range.
visualization
advanced
2:30remaining
Interpreting box plot visualization with multiple categories
You have this code to create box plots for two groups. Which statement correctly describes the visualization output?
Pandas
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(0)
data = pd.DataFrame({'group': ['A']*50 + ['B']*50, 'value': np.concatenate([np.random.normal(0, 1, 50), np.random.normal(1, 1.5, 50)])})
data.boxplot(by='group', column='value')
plt.show()
AGroup B has a higher median and wider spread than group A.
BGroup A has a higher median and wider spread than group B.
CBoth groups have the same median but different spreads.
DGroup B has a lower median and narrower spread than group A.
Attempts:
2 left
💡 Hint
Look at the mean and standard deviation used to generate each group.
🧠 Conceptual
advanced
1:30remaining
Understanding whiskers in box plots
In a box plot created by pandas, what do the whiskers represent by default?
AThe minimum and maximum values in the data.
BThe most extreme data points within 1.5 times the IQR from the quartiles.
CThe 1st and 3rd quartiles.
DThe mean plus or minus one standard deviation.
Attempts:
2 left
💡 Hint
Think about how box plots identify outliers.
🔧 Debug
expert
2:00remaining
Identify the error in box plot code with missing data
What error will this code raise when trying to plot a box plot with missing values?
Pandas
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.Series([1, 2, np.nan, 4, 5])
data.plot.box()
plt.show()
AAttributeError because plot.box() is not a valid method.
BValueError due to NaN values in the data.
CTypeError because NaN is not a number.
DNo error; the box plot will ignore NaN values and plot correctly.
Attempts:
2 left
💡 Hint
Check how pandas handles NaN values in plotting functions.