What if your model could understand categories by their real impact instead of just random numbers?
Why Target encoding in ML Python? - Purpose & Use Cases
Imagine you have a big table of customer data with categories like 'City' or 'Product Type'. You want to use this data to predict if a customer will buy something. But your computer only understands numbers, not words.
You try to convert these categories into numbers by hand, maybe by assigning 1 to 'New York', 2 to 'London', and so on.
This manual numbering is slow and tricky. It treats categories as if they have order or size, which they don't. Also, if a new city appears later, you have to stop and add it manually. This can cause mistakes and confuse your prediction model.
Target encoding smartly replaces each category with the average outcome (target) for that category. For example, if customers from 'New York' buy 70% of the time, 'New York' becomes 0.7. This way, the model gets meaningful numbers that relate directly to what you want to predict.
city_map = {'New York': 1, 'London': 2, 'Paris': 3}
data['city_num'] = data['city'].map(city_map)mean_target = data.groupby('city')['target'].mean() data['city_enc'] = data['city'].map(mean_target)
Target encoding lets your model learn from categories in a way that captures their true relationship with the goal, improving predictions without complex manual work.
In online shopping, target encoding can turn product categories into numbers that show how likely each product type is to be bought, helping recommenders suggest better items.
Manual category numbering is slow and can mislead models.
Target encoding uses the average target value per category for smarter numbers.
This improves model accuracy and handles new categories gracefully.