We use cut() and qcut() to group continuous numbers into categories or bins. This helps us understand data better by simplifying it.
0
0
cut() and qcut() for binning in Data Analysis Python
Introduction
You want to group ages into ranges like 0-18, 19-35, 36-60, 60+.
You want to divide sales amounts into equal-sized groups to compare performance.
You want to label exam scores as low, medium, or high based on score ranges.
You want to split data into equal-sized groups based on percentiles for analysis.
Syntax
Data Analysis Python
pd.cut(x, bins, labels=None, right=True, include_lowest=False) pd.qcut(x, q, labels=None, duplicates='raise')
cut() divides data into bins you define by numbers.
qcut() divides data into bins with equal number of points (quantiles).
Examples
This groups ages into 4 bins: 0-18, 19-35, 36-60, 61-100.
Data Analysis Python
import pandas as pd ages = [5, 12, 17, 18, 24, 32, 45, 52, 70] bins = [0, 18, 35, 60, 100] categories = pd.cut(ages, bins) print(categories)
This divides scores into 3 groups with equal number of scores each.
Data Analysis Python
import pandas as pd scores = [55, 67, 89, 45, 90, 78, 88, 92, 70] quantiles = pd.qcut(scores, 3, labels=['Low', 'Medium', 'High']) print(quantiles)
Sample Program
This program shows how to group exam scores into categories using fixed ranges with cut() and into equal-sized groups with qcut().
Data Analysis Python
import pandas as pd # Sample data: exam scores scores = [55, 67, 89, 45, 90, 78, 88, 92, 70, 60] # Using cut() to create fixed bins bins = [0, 60, 75, 90, 100] labels = ['Fail', 'Pass', 'Good', 'Excellent'] categorized_scores = pd.cut(scores, bins=bins, labels=labels, include_lowest=True) print('Using cut():') print(categorized_scores) # Using qcut() to create quantile bins quantile_labels = ['Low', 'Medium', 'High'] quantile_scores = pd.qcut(scores, q=3, labels=quantile_labels) print('\nUsing qcut():') print(quantile_scores)
OutputSuccess
Important Notes
With cut(), bins can overlap if not careful; always check bin edges.
qcut() may raise errors if there are duplicate values at bin edges; use duplicates='drop' to handle this.
Summary
cut() groups data by fixed numeric ranges you choose.
qcut() groups data into equal-sized groups based on data distribution.
Both help turn numbers into categories for easier analysis and visualization.