0
0
PandasHow-ToBeginner · 3 min read

How to Bin Continuous Data in Pandas: Simple Guide

You can bin continuous data in pandas using the pd.cut() function to create equal-width bins or pd.qcut() to create bins with equal number of data points. Both functions return categorical data representing the bins each value belongs to.
📐

Syntax

pd.cut() divides data into equal-width bins or custom bins. pd.qcut() divides data into bins with equal counts.

  • pd.cut(x, bins, labels=None, right=True):
    - x: data to bin
    - bins: number of bins or bin edges
    - labels: optional labels for bins
    - right: whether bins include the right edge
  • pd.qcut(x, q, labels=None):
    - x: data to bin
    - q: number of quantiles or list of quantiles
    - labels: optional labels for bins
python
pd.cut(x, bins, labels=None, right=True)
pd.qcut(x, q, labels=None)
💻

Example

This example shows how to bin a continuous numeric column into 3 equal-width bins using pd.cut() and into 3 quantile-based bins using pd.qcut().

python
import pandas as pd

# Sample continuous data
data = pd.DataFrame({'value': [1, 7, 5, 4, 6, 3, 8, 9, 2, 10]})

# Bin into 3 equal-width bins
data['cut_bins'] = pd.cut(data['value'], bins=3, labels=['Low', 'Medium', 'High'])

# Bin into 3 quantile-based bins
data['qcut_bins'] = pd.qcut(data['value'], q=3, labels=['Low', 'Medium', 'High'])

print(data)
Output
value cut_bins qcut_bins 0 1 Low Low 1 7 Medium Medium 2 5 Medium Medium 3 4 Medium Low 4 6 Medium Medium 5 3 Low Low 6 8 High High 7 9 High High 8 2 Low Low 9 10 High High
⚠️

Common Pitfalls

Common mistakes include:

  • Using pd.cut() when data is skewed, which can create uneven bin counts.
  • Not specifying labels and getting default interval labels that are hard to read.
  • Using pd.qcut() with duplicate edges causing errors if data has many repeated values.

Always check your data distribution before choosing the binning method.

python
import pandas as pd

# Data with many duplicates
data = pd.Series([1, 2, 2, 2, 3, 4, 5, 6, 7, 8])

# This may raise error due to duplicate edges
try:
    bins = pd.qcut(data, q=4)
except ValueError as e:
    error_message = str(e)

# Correct approach: use pd.cut or adjust q
bins_correct = pd.cut(data, bins=4)

print('Error:', error_message)
print('Correct bins:', bins_correct)
Output
Error: Bin edges must be unique: array([1., 2., 2., 5.5, 8.]). You can drop duplicate edges by setting duplicates='drop'. Correct bins: [(0.999, 2.75], (0.999, 2.75], (0.999, 2.75], (0.999, 2.75], (2.75, 4.5], (2.75, 4.5], (4.5, 6.25], (6.25, 8.0], (6.25, 8.0], (6.25, 8.0]] Categories (4, interval[float64, right]): [(0.999, 2.75] < (2.75, 4.5] < (4.5, 6.25] < (6.25, 8.0]]
📊

Quick Reference

FunctionPurposeKey ParametersReturns
pd.cutBins data into equal-width or custom binsx, bins, labels, rightCategorical with bin intervals or labels
pd.qcutBins data into quantile-based bins with equal countsx, q, labelsCategorical with quantile bins

Key Takeaways

Use pd.cut() to bin data into equal-width intervals.
Use pd.qcut() to bin data into quantiles with equal counts.
Always label bins for easier interpretation.
Check for duplicate edges when using pd.qcut() to avoid errors.
Choose binning method based on data distribution and analysis needs.