Challenge - 5 Problems
Binning Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of pandas cut() with custom bins
What is the output of the following code snippet?
Pandas
import pandas as pd import numpy as np values = pd.Series([1, 7, 5, 4, 6, 3]) bins = [0, 3, 6, 9] categories = pd.cut(values, bins) print(categories.value_counts().to_dict())
Attempts:
2 left
💡 Hint
Remember that cut() bins are right-inclusive by default.
✗ Incorrect
The bins are (0,3], (3,6], and (6,9]. Values 1 and 3 fall into (0,3], values 4,5,6 fall into (3,6], and value 7 falls into (6,9].
❓ data_output
intermediate2:00remaining
Number of bins created by qcut() with duplicates
Given the following code, how many unique bins does qcut() create?
Pandas
import pandas as pd values = pd.Series([1, 2, 2, 3, 4, 5, 6, 7, 8, 9]) categories = pd.qcut(values, 4, duplicates='drop') print(categories.cat.categories)
Attempts:
2 left
💡 Hint
qcut tries to create equal-sized bins but drops duplicates in bin edges.
✗ Incorrect
Because the value 2 appears twice, some quantile edges coincide, so qcut drops duplicates and creates 3 bins instead of 4.
🔧 Debug
advanced2:00remaining
Identify the error in this cut() usage
What error does the following code raise?
Pandas
import pandas as pd values = pd.Series([1, 2, 3, 4, 5]) bins = [0, 2, 4] categories = pd.cut(values, bins, labels=["Low", "Medium", "High"]) print(categories)
Attempts:
2 left
💡 Hint
Check the number of labels vs number of bins.
✗ Incorrect
The number of labels must be exactly one less than the number of bin edges. Here, bins has length 3, so labels must have length 2.
❓ visualization
advanced2:00remaining
Interpreting qcut() bin edges from output
Given this code, which option correctly describes the bin edges printed?
Pandas
import pandas as pd values = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100]) categories = pd.qcut(values, 5) print(categories.cat.categories)
Attempts:
2 left
💡 Hint
qcut creates bins with equal number of data points, edges are quantiles.
✗ Incorrect
qcut divides data into equal-sized groups by rank, so edges correspond to quantiles, not equal width.
🚀 Application
expert3:00remaining
Choosing binning method for skewed data
You have a dataset with highly skewed income values. You want to create 4 bins that each contain roughly the same number of people. Which method and parameters should you use?
Attempts:
2 left
💡 Hint
Think about how to get bins with equal counts despite skew.
✗ Incorrect
qcut creates bins with equal counts by default. Using duplicates='drop' handles any duplicate edges due to skew.