0
0
ML Pythonml~20 mins

Binning continuous variables in ML Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Binning Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why use binning for continuous variables?

Which of the following is the main reason to use binning on continuous variables in machine learning?

ATo reduce the effect of outliers by grouping values into intervals
BTo make the model training slower and more complex
CTo convert categorical variables into numerical values
DTo increase the number of unique values in the dataset
Attempts:
2 left
💡 Hint

Think about how grouping values can help handle extreme values.

Predict Output
intermediate
2:00remaining
Output of binning with pandas cut

What is the output of the following Python code?

ML Python
import pandas as pd
values = [1, 5, 10, 15, 20]
bins = [0, 5, 10, 15, 20]
categories = pd.cut(values, bins)
print(categories.tolist())
A[Interval(0, 5, closed='right'), Interval(5, 10, closed='right'), Interval(10, 15, closed='right'), Interval(15, 20, closed='right'), Interval(15, 20, closed='right')]
B[Interval(0, 5, closed='right'), Interval(5, 10, closed='right'), Interval(10, 15, closed='right'), Interval(15, 20, closed='right'), NaN]
C[Interval(0, 5, closed='right'), Interval(0, 5, closed='right'), Interval(5, 10, closed='right'), Interval(10, 15, closed='right'), Interval(15, 20, closed='right')]
D[Interval(0, 5, closed='right'), Interval(0, 5, closed='right'), Interval(5, 10, closed='right'), Interval(10, 15, closed='right'), NaN]
Attempts:
2 left
💡 Hint

Check which bin each value falls into based on the intervals.

Model Choice
advanced
2:00remaining
Choosing binning method for skewed data

You have a highly skewed continuous feature. Which binning method is best to preserve information for a decision tree model?

AEqual-frequency binning (bins with same number of samples)
BRandom binning (bins assigned randomly)
CEqual-width binning (bins of same size range)
DNo binning, use raw continuous values only
Attempts:
2 left
💡 Hint

Consider how to balance data distribution across bins.

Metrics
advanced
2:00remaining
Effect of binning on model accuracy

After binning a continuous variable into 4 bins, you train a logistic regression model. Which metric is most appropriate to check if binning improved model performance?

AMean squared error on training data
BAccuracy score on validation data
CNumber of bins created
DTraining time in seconds
Attempts:
2 left
💡 Hint

Think about how to measure model quality on unseen data.

🔧 Debug
expert
2:00remaining
Debugging binning code with pandas qcut

What error does this code raise?

import pandas as pd
values = [1, 2, 2, 2, 3]
bins = pd.qcut(values, q=4)
print(bins)
ATypeError: qcut expects a DataFrame, not list
BIndexError: list index out of range
CNo error, prints 4 equal-frequency bins
DValueError: Bin edges must be unique
Attempts:
2 left
💡 Hint

Check if the data has enough unique values for the requested bins.