0
0
Pandasdata~10 mins

Binning with cut() and qcut() in Pandas - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Binning with cut() and qcut()
Start with numeric data
Choose binning method
Define bins
Assign bins
Analyze binned data
We start with numeric data, choose either cut() for fixed bins or qcut() for quantile-based bins, assign data to bins, then analyze the results.
Execution Sample
Pandas
import pandas as pd

ages = [22, 25, 47, 35, 46, 55, 21, 23, 37, 52]

bins = [20, 30, 40, 50, 60]
labels = ['20-29', '30-39', '40-49', '50-59']

age_bins = pd.cut(ages, bins=bins, labels=labels)

quantile_bins = pd.qcut(ages, q=4, labels=['Q1', 'Q2', 'Q3', 'Q4'])
This code bins ages into fixed ranges using cut(), and into four quantile groups using qcut().
Execution Table
StepData Pointcut() Binqcut() Quantile Bin
12220-29Q1
22520-29Q1
34740-49Q3
43530-39Q2
54640-49Q3
65550-59Q4
72120-29Q1
82320-29Q1
93730-39Q2
105250-59Q4
ExitAll data processed--
💡 All data points assigned to bins; execution ends.
Variable Tracker
VariableStartAfter 1After 2After 3After 4After 5After 6After 7After 8After 9After 10Final
ages[22,25,47,35,46,55,21,23,37,52][25,47,35,46,55,21,23,37,52][47,35,46,55,21,23,37,52][35,46,55,21,23,37,52][46,55,21,23,37,52][55,21,23,37,52][21,23,37,52][23,37,52][37,52][52][][]
age_bins[]['20-29']['20-29', '20-29']['20-29', '20-29', '40-49']['20-29', '20-29', '40-49', '30-39']['20-29', '20-29', '40-49', '30-39', '40-49']['20-29', '20-29', '40-49', '30-39', '40-49', '50-59']['20-29', '20-29', '40-49', '30-39', '40-49', '50-59', '20-29']['20-29', '20-29', '40-49', '30-39', '40-49', '50-59', '20-29', '20-29']['20-29', '20-29', '40-49', '30-39', '40-49', '50-59', '20-29', '20-29', '30-39']['20-29', '20-29', '40-49', '30-39', '40-49', '50-59', '20-29', '20-29', '30-39', '50-59']['20-29', '20-29', '40-49', '30-39', '40-49', '50-59', '20-29', '20-29', '30-39', '50-59']
quantile_bins[]['Q1']['Q1', 'Q1']['Q1', 'Q1', 'Q3']['Q1', 'Q1', 'Q3', 'Q2']['Q1', 'Q1', 'Q3', 'Q2', 'Q3']['Q1', 'Q1', 'Q3', 'Q2', 'Q3', 'Q4']['Q1', 'Q1', 'Q3', 'Q2', 'Q3', 'Q4', 'Q1']['Q1', 'Q1', 'Q3', 'Q2', 'Q3', 'Q4', 'Q1', 'Q1']['Q1', 'Q1', 'Q3', 'Q2', 'Q3', 'Q4', 'Q1', 'Q1', 'Q2']['Q1', 'Q1', 'Q3', 'Q2', 'Q3', 'Q4', 'Q1', 'Q1', 'Q2', 'Q4']['Q1', 'Q1', 'Q3', 'Q2', 'Q3', 'Q4', 'Q1', 'Q1', 'Q2', 'Q4']
Key Moments - 2 Insights
Why do some ages fall into the same bin in cut() but different bins in qcut()?
cut() uses fixed ranges (bins), so ages 22 and 25 both go to '20-29'. qcut() divides data into equal-sized groups (quantiles), so 22 is Q1 and 25 is also Q1 due to the data distribution. See execution_table rows 1-2.
What happens if a data point is exactly on a bin edge in cut()?
By default (right=True), cut() uses (a, b] intervals, so value == b goes into the bin ending at b (upper bin). Set right=False for [a, b). No exact edges in our example.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what bin does the age 37 fall into using cut()?
A20-29
B30-39
C40-49
D50-59
💡 Hint
Check the row where Data Point is 37 in the cut() Bin column.
At which step does qcut() assign the first data point to Q3?
AStep 4
BStep 2
CStep 3
DStep 5
💡 Hint
Look at qcut() Quantile Bin column in execution_table for the earliest Q3.
If we change bins in cut() to [20, 35, 50, 60], how would the bin for age 35 change?
AIt would be in '35-50' bin
BIt would be in '30-39' bin
CIt would be in '20-29' bin
DIt would be in '50-59' bin
💡 Hint
Bins define intervals; with default right=True, 35==35 goes to (20,35], but assuming adjusted labels or right=False for '35-50'.
Concept Snapshot
pandas cut() divides data into fixed bins by ranges.
pandas qcut() divides data into bins with equal counts (quantiles).
Use cut() when you want specific ranges.
Use qcut() when you want balanced groups.
Both return categorical bins for analysis.
Full Transcript
We start with a list of ages. Using pandas cut(), we define fixed bins like 20-29, 30-39, etc., and assign each age to these bins. Using qcut(), we split the data into four equal groups called quantiles labeled Q1 to Q4. Each age is assigned to a bin or quantile based on its value. The execution table shows each age's bin assignments step by step. Variables track how bins fill up as we process each age. Key moments clarify why cut() and qcut() differ and how bin edges are handled. The quiz tests understanding of bin assignments and effects of changing bins. The snapshot summarizes when to use cut() or qcut() for binning numeric data.