0
0
PandasHow-ToBeginner · 3 min read

How to Use cut in pandas for Data Binning

Use pandas.cut() to divide continuous data into discrete bins or intervals. It helps group numeric values into categories by specifying bin edges or number of bins.
📐

Syntax

The basic syntax of pandas.cut() is:

  • x: The array or Series of numeric data to bin.
  • bins: Number of bins or a list of bin edges.
  • right: Whether bins include the right edge (default is True).
  • labels: Optional labels for the bins.
  • include_lowest: Whether to include the lowest value in the first bin.
python
pandas.cut(x, bins, right=True, labels=None, include_lowest=False)
💻

Example

This example shows how to use pandas.cut() to split a numeric Series into 3 equal-width bins and assign labels to each bin.

python
import pandas as pd

# Sample data
ages = pd.Series([22, 25, 47, 35, 46, 55, 67, 70, 18, 30])

# Cut into 3 bins with labels
bins = [0, 30, 60, 100]
labels = ['Young', 'Middle-aged', 'Senior']

age_groups = pd.cut(ages, bins=bins, labels=labels, include_lowest=True)

print(age_groups)
Output
[Young, Young, Middle-aged, Middle-aged, Middle-aged, Middle-aged, Senior, Senior, Young, Middle-aged] Categories (3, object): [Young < Middle-aged < Senior]
⚠️

Common Pitfalls

Common mistakes when using pandas.cut() include:

  • Not including the lowest value with include_lowest=True, which can exclude the smallest data point.
  • Using overlapping or unordered bins, which causes errors.
  • Not providing labels when you want meaningful categories, resulting in interval objects instead.
python
import pandas as pd

values = pd.Series([1, 2, 3, 4, 5])

# Wrong: bins not ordered
# pd.cut(values, bins=[3, 1, 5])  # This will raise an error

# Right: bins ordered
cut_result = pd.cut(values, bins=[1, 3, 5], include_lowest=True)
print(cut_result)
Output
[[1, 3], [1, 3], (3, 5], (3, 5], (3, 5]] Categories (2, interval[int64, right]): [[1, 3] < (3, 5]]
📊

Quick Reference

ParameterDescription
xArray or Series to bin
binsNumber of bins or list of bin edges
rightInclude right edge in bin (default True)
labelsCustom labels for bins (optional)
include_lowestInclude lowest value in first bin (default False)
retbinsReturn the bins used (default False)
precisionDecimal precision for bin labels (default 3)

Key Takeaways

Use pandas.cut() to split continuous data into discrete bins easily.
Always ensure bins are ordered and non-overlapping to avoid errors.
Use labels to make bin categories more readable.
Set include_lowest=True to include the smallest value in the first bin.
pandas.cut() returns a categorical Series representing the bins.