How to Bin Continuous Data in Pandas: Simple Guide
You can bin continuous data in pandas using the
pd.cut() function to create equal-width bins or pd.qcut() to create bins with equal number of data points. Both functions return categorical data representing the bins each value belongs to.Syntax
pd.cut() divides data into equal-width bins or custom bins. pd.qcut() divides data into bins with equal counts.
pd.cut(x, bins, labels=None, right=True):
-x: data to bin
-bins: number of bins or bin edges
-labels: optional labels for bins
-right: whether bins include the right edgepd.qcut(x, q, labels=None):
-x: data to bin
-q: number of quantiles or list of quantiles
-labels: optional labels for bins
python
pd.cut(x, bins, labels=None, right=True) pd.qcut(x, q, labels=None)
Example
This example shows how to bin a continuous numeric column into 3 equal-width bins using pd.cut() and into 3 quantile-based bins using pd.qcut().
python
import pandas as pd # Sample continuous data data = pd.DataFrame({'value': [1, 7, 5, 4, 6, 3, 8, 9, 2, 10]}) # Bin into 3 equal-width bins data['cut_bins'] = pd.cut(data['value'], bins=3, labels=['Low', 'Medium', 'High']) # Bin into 3 quantile-based bins data['qcut_bins'] = pd.qcut(data['value'], q=3, labels=['Low', 'Medium', 'High']) print(data)
Output
value cut_bins qcut_bins
0 1 Low Low
1 7 Medium Medium
2 5 Medium Medium
3 4 Medium Low
4 6 Medium Medium
5 3 Low Low
6 8 High High
7 9 High High
8 2 Low Low
9 10 High High
Common Pitfalls
Common mistakes include:
- Using
pd.cut()when data is skewed, which can create uneven bin counts. - Not specifying
labelsand getting default interval labels that are hard to read. - Using
pd.qcut()with duplicate edges causing errors if data has many repeated values.
Always check your data distribution before choosing the binning method.
python
import pandas as pd # Data with many duplicates data = pd.Series([1, 2, 2, 2, 3, 4, 5, 6, 7, 8]) # This may raise error due to duplicate edges try: bins = pd.qcut(data, q=4) except ValueError as e: error_message = str(e) # Correct approach: use pd.cut or adjust q bins_correct = pd.cut(data, bins=4) print('Error:', error_message) print('Correct bins:', bins_correct)
Output
Error: Bin edges must be unique: array([1., 2., 2., 5.5, 8.]).
You can drop duplicate edges by setting duplicates='drop'.
Correct bins: [(0.999, 2.75], (0.999, 2.75], (0.999, 2.75], (0.999, 2.75], (2.75, 4.5], (2.75, 4.5], (4.5, 6.25], (6.25, 8.0], (6.25, 8.0], (6.25, 8.0]]
Categories (4, interval[float64, right]): [(0.999, 2.75] < (2.75, 4.5] < (4.5, 6.25] < (6.25, 8.0]]
Quick Reference
| Function | Purpose | Key Parameters | Returns |
|---|---|---|---|
| pd.cut | Bins data into equal-width or custom bins | x, bins, labels, right | Categorical with bin intervals or labels |
| pd.qcut | Bins data into quantile-based bins with equal counts | x, q, labels | Categorical with quantile bins |
Key Takeaways
Use pd.cut() to bin data into equal-width intervals.
Use pd.qcut() to bin data into quantiles with equal counts.
Always label bins for easier interpretation.
Check for duplicate edges when using pd.qcut() to avoid errors.
Choose binning method based on data distribution and analysis needs.