0
0
Data Analysis Pythondata~20 mins

Binning continuous variables in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Binning Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of pandas cut with custom bins
What is the output of the following code snippet?
Data Analysis Python
import pandas as pd
import numpy as np

values = np.array([1, 5, 10, 15, 20])
bins = [0, 5, 10, 15]
categories = pd.cut(values, bins)
print(categories)
A
[(0, 5], (0, 5], (5, 10], (10, 15], NaN]
Categories (3, interval[int64, right]): [(0, 5] < (5, 10] < (10, 15]]
B
[(0, 5], NaN, (5, 10], (10, 15], NaN]
Categories (3, interval[int64, right]): [(0, 5] < (5, 10] < (10, 15]]
C
[NaN, (0, 5], (5, 10], (10, 15], NaN]
Categories (3, interval[int64, right]): [(0, 5] < (5, 10] < (10, 15]]
D
[NaN, (0, 5], (5, 10], (10, 15], (15, 20]]
Categories (4, interval[int64, right]): [(0, 5] < (5, 10] < (10, 15] < (15, 20]]
Attempts:
2 left
💡 Hint
Remember that pd.cut assigns values to bins based on intervals defined by the bins list. Values equal to the left edge are excluded, right edge included by default.
data_output
intermediate
1:30remaining
Number of bins created by qcut
Given the following code, how many unique bins will be created?
Data Analysis Python
import pandas as pd
import numpy as np

values = np.array([1, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10])
categories = pd.qcut(values, 4)
unique_bins = categories.unique()
print(len(unique_bins))
A3
B6
C5
D4
Attempts:
2 left
💡 Hint
qcut tries to create equal-sized bins but duplicates in data can reduce the number of bins.
visualization
advanced
2:30remaining
Visualizing binning effect on data distribution
Which option shows the correct histogram with bins created by pd.cut for the data below?
Data Analysis Python
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(0)
data = np.random.normal(loc=50, scale=10, size=1000)
bins = [20, 40, 60, 80]
categories = pd.cut(data, bins)
plt.hist(data, bins=bins, edgecolor='black')
plt.title('Histogram with bins [20, 40, 60, 80]')
plt.show()
AHistogram with 4 bars showing counts of data in intervals [20,40), [40,60), [60,80), [80,100)
BHistogram with 3 bars showing counts of data in intervals (20,40], (40,60], (60,80]
CHistogram with 3 bars showing counts of data in intervals [20,40), [40,60), [60,80)
DHistogram with 4 bars showing counts of data in intervals (20,40], (40,60], (60,80], (80,100]
Attempts:
2 left
💡 Hint
pd.cut by default includes the right edge of intervals and excludes the left edge.
🧠 Conceptual
advanced
1:00remaining
Effect of right parameter in pd.cut
What is the effect of setting right=False in pd.cut when binning data?
ABins include the right edge and exclude the left edge of intervals.
BBins include both edges of intervals.
CBins include the left edge and exclude the right edge of intervals.
DBins exclude both edges of intervals.
Attempts:
2 left
💡 Hint
By default, pd.cut includes the right edge. Changing right changes which edge is included.
🔧 Debug
expert
1:30remaining
Identify the error in binning code
What error will the following code raise?
Data Analysis Python
import pandas as pd
values = [1, 2, 3, 4, 5]
bins = [0, 2, 4]
categories = pd.cut(values, bins, labels=['Low', 'Medium', 'High'])
ANo error, code runs successfully
BTypeError: 'list' object is not callable
CIndexError: list index out of range
DValueError: Bin labels must be one fewer than the number of bin edges
Attempts:
2 left
💡 Hint
Check the length of labels compared to bins.