0
0
Data Analysis Pythondata~5 mins

cut() and qcut() for binning in Data Analysis Python

Choose your learning style9 modes available
Introduction

We use cut() and qcut() to group continuous numbers into categories or bins. This helps us understand data better by simplifying it.

You want to group ages into ranges like 0-18, 19-35, 36-60, 60+.
You want to divide sales amounts into equal-sized groups to compare performance.
You want to label exam scores as low, medium, or high based on score ranges.
You want to split data into equal-sized groups based on percentiles for analysis.
Syntax
Data Analysis Python
pd.cut(x, bins, labels=None, right=True, include_lowest=False)
pd.qcut(x, q, labels=None, duplicates='raise')

cut() divides data into bins you define by numbers.

qcut() divides data into bins with equal number of points (quantiles).

Examples
This groups ages into 4 bins: 0-18, 19-35, 36-60, 61-100.
Data Analysis Python
import pandas as pd
ages = [5, 12, 17, 18, 24, 32, 45, 52, 70]
bins = [0, 18, 35, 60, 100]
categories = pd.cut(ages, bins)
print(categories)
This divides scores into 3 groups with equal number of scores each.
Data Analysis Python
import pandas as pd
scores = [55, 67, 89, 45, 90, 78, 88, 92, 70]
quantiles = pd.qcut(scores, 3, labels=['Low', 'Medium', 'High'])
print(quantiles)
Sample Program

This program shows how to group exam scores into categories using fixed ranges with cut() and into equal-sized groups with qcut().

Data Analysis Python
import pandas as pd

# Sample data: exam scores
scores = [55, 67, 89, 45, 90, 78, 88, 92, 70, 60]

# Using cut() to create fixed bins
bins = [0, 60, 75, 90, 100]
labels = ['Fail', 'Pass', 'Good', 'Excellent']
categorized_scores = pd.cut(scores, bins=bins, labels=labels, include_lowest=True)
print('Using cut():')
print(categorized_scores)

# Using qcut() to create quantile bins
quantile_labels = ['Low', 'Medium', 'High']
quantile_scores = pd.qcut(scores, q=3, labels=quantile_labels)
print('\nUsing qcut():')
print(quantile_scores)
OutputSuccess
Important Notes

With cut(), bins can overlap if not careful; always check bin edges.

qcut() may raise errors if there are duplicate values at bin edges; use duplicates='drop' to handle this.

Summary

cut() groups data by fixed numeric ranges you choose.

qcut() groups data into equal-sized groups based on data distribution.

Both help turn numbers into categories for easier analysis and visualization.