Challenge - 5 Problems
Binning Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of pandas cut with custom bins
What is the output of the following code snippet?
Data Analysis Python
import pandas as pd import numpy as np values = np.array([1, 5, 10, 15, 20]) bins = [0, 5, 10, 15] categories = pd.cut(values, bins) print(categories)
Attempts:
2 left
💡 Hint
Remember that pd.cut assigns values to bins based on intervals defined by the bins list. Values equal to the left edge are excluded, right edge included by default.
✗ Incorrect
The bins are (0,5], (5,10], and (10,15]. The value 1 falls into (0,5], 5 also falls into (0,5] because the right edge is included, 10 falls into (5,10], 15 falls into (10,15], and 20 is outside the bins so it is NaN.
❓ data_output
intermediate1:30remaining
Number of bins created by qcut
Given the following code, how many unique bins will be created?
Data Analysis Python
import pandas as pd import numpy as np values = np.array([1, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10]) categories = pd.qcut(values, 4) unique_bins = categories.unique() print(len(unique_bins))
Attempts:
2 left
💡 Hint
qcut tries to create equal-sized bins but duplicates in data can reduce the number of bins.
✗ Incorrect
Because the value 2 appears twice, qcut merges bins to avoid empty bins, resulting in 3 unique bins instead of 4.
❓ visualization
advanced2:30remaining
Visualizing binning effect on data distribution
Which option shows the correct histogram with bins created by pd.cut for the data below?
Data Analysis Python
import pandas as pd import matplotlib.pyplot as plt import numpy as np np.random.seed(0) data = np.random.normal(loc=50, scale=10, size=1000) bins = [20, 40, 60, 80] categories = pd.cut(data, bins) plt.hist(data, bins=bins, edgecolor='black') plt.title('Histogram with bins [20, 40, 60, 80]') plt.show()
Attempts:
2 left
💡 Hint
pd.cut by default includes the right edge of intervals and excludes the left edge.
✗ Incorrect
The bins define intervals (20,40], (40,60], and (60,80]. The histogram shows 3 bars corresponding to these intervals.
🧠 Conceptual
advanced1:00remaining
Effect of right parameter in pd.cut
What is the effect of setting right=False in pd.cut when binning data?
Attempts:
2 left
💡 Hint
By default, pd.cut includes the right edge. Changing right changes which edge is included.
✗ Incorrect
Setting right=False means intervals are left-inclusive and right-exclusive, so the left edge is included in the bin.
🔧 Debug
expert1:30remaining
Identify the error in binning code
What error will the following code raise?
Data Analysis Python
import pandas as pd values = [1, 2, 3, 4, 5] bins = [0, 2, 4] categories = pd.cut(values, bins, labels=['Low', 'Medium', 'High'])
Attempts:
2 left
💡 Hint
Check the length of labels compared to bins.
✗ Incorrect
The labels list has 3 elements but bins define only 2 intervals, so labels length must be 2, causing a ValueError.