What is Target encoding in ML Python?

ML Pythonml~5 mins

Target encoding in ML Python

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

Target encoding helps turn categories into numbers by using the average of the target values. This makes it easier for models to understand and use categorical data.

When you have categorical data with many unique values (like zip codes or product IDs).

When you want to keep useful information from categories related to the target.

When one-hot encoding would create too many new columns and slow down the model.

When you want to improve model performance by using target-related information.

When preparing data for models that only accept numbers, like linear regression.

Syntax

ML Python

from category_encoders import TargetEncoder

encoder = TargetEncoder(cols=['category_column'])
encoded_data = encoder.fit_transform(X, y)

You need to install the category_encoders package first using pip install category_encoders.

cols specifies which columns to encode. X is your features, and y is the target variable.

Examples

Encode the 'color' column using target encoding.

ML Python

from category_encoders import TargetEncoder
encoder = TargetEncoder(cols=['color'])
encoded_X = encoder.fit_transform(X, y)

Encode multiple categorical columns 'city' and 'brand' at once.

ML Python

encoder = TargetEncoder(cols=['city', 'brand'])
encoded_X = encoder.fit_transform(X, y)

Use the already fitted encoder to transform new data without changing the encoding.

ML Python

encoded_X = encoder.transform(new_X)

Sample Model

This example shows how to use target encoding on a simple dataset with a 'color' category and a binary target. We encode the 'color' column, train a logistic regression model, and check accuracy.

ML Python

import pandas as pd
from category_encoders import TargetEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Sample data with categorical feature and binary target
data = pd.DataFrame({
    'color': ['red', 'blue', 'green', 'blue', 'red', 'green', 'red', 'blue'],
    'target': [1, 0, 1, 0, 1, 0, 1, 0]
})

X = data[['color']]
y = data['target']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Create and fit target encoder
encoder = TargetEncoder(cols=['color'])
X_train_encoded = encoder.fit_transform(X_train, y_train)
X_test_encoded = encoder.transform(X_test)

# Train a simple model
model = LogisticRegression()
model.fit(X_train_encoded, y_train)

# Predict and evaluate
preds = model.predict(X_test_encoded)
acc = accuracy_score(y_test, preds)

print(f"Encoded training data:\n{X_train_encoded}\n")
print(f"Predictions: {preds}")
print(f"Accuracy: {acc:.2f}")

OutputSuccess

Important Notes

Target encoding can cause overfitting if not done carefully. Use techniques like cross-validation or smoothing.

Always fit the encoder only on training data, then transform test data to avoid data leakage.

Target encoding works best with categorical features that have a meaningful relationship with the target.

Summary

Target encoding converts categories into numbers using the average target value.

It helps models use categorical data without creating many new columns.

Be careful to avoid overfitting by fitting only on training data.

Practice

(1/5)

1. What is the main purpose of target encoding in machine learning?

easy

A. Remove missing values from the dataset

B. Normalize numerical features to a 0-1 scale

C. Create new categorical features by combining existing ones

D. Convert categorical variables into numbers using the average target value

Target encoding in ML Python

Start learning this pattern below

Practice

Solution

Step 1: Understand what target encoding does

Step 2: Compare with other options

Final Answer:

Quick Check:

Solution

Step 1: Identify correct target encoding method

Step 2: Check code correctness

Final Answer:

Quick Check:

Solution

Step 1: Calculate mean target per category from training data

Step 2: Map test categories and fill missing

Final Answer:

Quick Check:

Solution

Step 1: Understand overfitting cause in target encoding

Step 2: Identify mistake in data leakage

Final Answer:

Quick Check:

Solution

Step 1: Understand overfitting from rare categories

Step 2: Apply smoothing to reduce noise

Final Answer:

Quick Check: