How to Use cut in pandas for Data Binning
Use
pandas.cut() to divide continuous data into discrete bins or intervals. It helps group numeric values into categories by specifying bin edges or number of bins.Syntax
The basic syntax of pandas.cut() is:
x: The array or Series of numeric data to bin.bins: Number of bins or a list of bin edges.right: Whether bins include the right edge (default is True).labels: Optional labels for the bins.include_lowest: Whether to include the lowest value in the first bin.
python
pandas.cut(x, bins, right=True, labels=None, include_lowest=False)
Example
This example shows how to use pandas.cut() to split a numeric Series into 3 equal-width bins and assign labels to each bin.
python
import pandas as pd # Sample data ages = pd.Series([22, 25, 47, 35, 46, 55, 67, 70, 18, 30]) # Cut into 3 bins with labels bins = [0, 30, 60, 100] labels = ['Young', 'Middle-aged', 'Senior'] age_groups = pd.cut(ages, bins=bins, labels=labels, include_lowest=True) print(age_groups)
Output
[Young, Young, Middle-aged, Middle-aged, Middle-aged, Middle-aged, Senior, Senior, Young, Middle-aged]
Categories (3, object): [Young < Middle-aged < Senior]
Common Pitfalls
Common mistakes when using pandas.cut() include:
- Not including the lowest value with
include_lowest=True, which can exclude the smallest data point. - Using overlapping or unordered bins, which causes errors.
- Not providing labels when you want meaningful categories, resulting in interval objects instead.
python
import pandas as pd values = pd.Series([1, 2, 3, 4, 5]) # Wrong: bins not ordered # pd.cut(values, bins=[3, 1, 5]) # This will raise an error # Right: bins ordered cut_result = pd.cut(values, bins=[1, 3, 5], include_lowest=True) print(cut_result)
Output
[[1, 3], [1, 3], (3, 5], (3, 5], (3, 5]]
Categories (2, interval[int64, right]): [[1, 3] < (3, 5]]
Quick Reference
| Parameter | Description |
|---|---|
| x | Array or Series to bin |
| bins | Number of bins or list of bin edges |
| right | Include right edge in bin (default True) |
| labels | Custom labels for bins (optional) |
| include_lowest | Include lowest value in first bin (default False) |
| retbins | Return the bins used (default False) |
| precision | Decimal precision for bin labels (default 3) |
Key Takeaways
Use pandas.cut() to split continuous data into discrete bins easily.
Always ensure bins are ordered and non-overlapping to avoid errors.
Use labels to make bin categories more readable.
Set include_lowest=True to include the smallest value in the first bin.
pandas.cut() returns a categorical Series representing the bins.