0
0
ML Pythonml~3 mins

Why Target encoding in ML Python? - Purpose & Use Cases

Choose your learning style9 modes available
The Big Idea

What if your model could understand categories by their real impact instead of just random numbers?

The Scenario

Imagine you have a big table of customer data with categories like 'City' or 'Product Type'. You want to use this data to predict if a customer will buy something. But your computer only understands numbers, not words.

You try to convert these categories into numbers by hand, maybe by assigning 1 to 'New York', 2 to 'London', and so on.

The Problem

This manual numbering is slow and tricky. It treats categories as if they have order or size, which they don't. Also, if a new city appears later, you have to stop and add it manually. This can cause mistakes and confuse your prediction model.

The Solution

Target encoding smartly replaces each category with the average outcome (target) for that category. For example, if customers from 'New York' buy 70% of the time, 'New York' becomes 0.7. This way, the model gets meaningful numbers that relate directly to what you want to predict.

Before vs After
Before
city_map = {'New York': 1, 'London': 2, 'Paris': 3}
data['city_num'] = data['city'].map(city_map)
After
mean_target = data.groupby('city')['target'].mean()
data['city_enc'] = data['city'].map(mean_target)
What It Enables

Target encoding lets your model learn from categories in a way that captures their true relationship with the goal, improving predictions without complex manual work.

Real Life Example

In online shopping, target encoding can turn product categories into numbers that show how likely each product type is to be bought, helping recommenders suggest better items.

Key Takeaways

Manual category numbering is slow and can mislead models.

Target encoding uses the average target value per category for smarter numbers.

This improves model accuracy and handles new categories gracefully.