0
0
ML Pythonml~5 mins

Binning continuous variables in ML Python

Choose your learning style9 modes available
Introduction

Binning helps turn continuous numbers into groups. This makes data easier to understand and use in models.

When you want to simplify data by grouping ages into ranges like 0-10, 11-20, etc.
When you want to reduce noise in data by grouping similar values together.
When you want to prepare data for models that work better with categories than numbers.
When you want to create easy-to-interpret reports or charts with grouped data.
When you want to handle outliers by putting extreme values into separate bins.
Syntax
ML Python
import pandas as pd

# Using pandas cut function
binned_data = pd.cut(data, bins=number_of_bins, labels=optional_labels)

# Using pandas qcut function for equal-sized bins
binned_data = pd.qcut(data, q=number_of_bins, labels=optional_labels)

pd.cut splits data into equal-width bins.

pd.qcut splits data into bins with equal number of points.

Examples
This example groups ages into 4 bins: child, young adult, adult, senior.
ML Python
import pandas as pd

ages = [5, 12, 17, 24, 32, 45, 52, 67, 70]
bins = [0, 18, 35, 60, 100]
binned_ages = pd.cut(ages, bins)
print(binned_ages)
This example divides scores into 3 groups with equal number of scores each.
ML Python
import pandas as pd

scores = [55, 60, 65, 70, 75, 80, 85, 90, 95]
binned_scores = pd.qcut(scores, q=3, labels=['Low', 'Medium', 'High'])
print(binned_scores)
Sample Model

This program groups heights into three categories: Short, Average, and Tall using fixed ranges.

ML Python
import pandas as pd

# Sample continuous data
heights = [150, 160, 165, 170, 175, 180, 185, 190, 195]

# Define bins for height ranges
bins = [140, 160, 180, 200]
labels = ['Short', 'Average', 'Tall']

# Bin the heights
binned_heights = pd.cut(heights, bins=bins, labels=labels, right=False)

# Show original heights and their bins
for height, group in zip(heights, binned_heights):
    print(f'Height: {height} cm -> Group: {group}')
OutputSuccess
Important Notes

Bins should cover the full range of your data to avoid missing values.

Labels are optional but help make the groups easier to understand.

pd.qcut can fail if there are many duplicate values; pd.cut is more stable in that case.

Summary

Binning turns continuous numbers into groups to simplify data.

Use pd.cut for equal-width bins and pd.qcut for equal-sized bins.

Labels help make bin groups easy to read and understand.