Target encoding helps turn categories into numbers by using the average of the target values. This makes it easier for models to understand and use categorical data.
Target encoding in ML Python
from category_encoders import TargetEncoder encoder = TargetEncoder(cols=['category_column']) encoded_data = encoder.fit_transform(X, y)
You need to install the category_encoders package first using pip install category_encoders.
cols specifies which columns to encode. X is your features, and y is the target variable.
from category_encoders import TargetEncoder encoder = TargetEncoder(cols=['color']) encoded_X = encoder.fit_transform(X, y)
encoder = TargetEncoder(cols=['city', 'brand']) encoded_X = encoder.fit_transform(X, y)
encoded_X = encoder.transform(new_X)
This example shows how to use target encoding on a simple dataset with a 'color' category and a binary target. We encode the 'color' column, train a logistic regression model, and check accuracy.
import pandas as pd from category_encoders import TargetEncoder from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score # Sample data with categorical feature and binary target data = pd.DataFrame({ 'color': ['red', 'blue', 'green', 'blue', 'red', 'green', 'red', 'blue'], 'target': [1, 0, 1, 0, 1, 0, 1, 0] }) X = data[['color']] y = data['target'] # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Create and fit target encoder encoder = TargetEncoder(cols=['color']) X_train_encoded = encoder.fit_transform(X_train, y_train) X_test_encoded = encoder.transform(X_test) # Train a simple model model = LogisticRegression() model.fit(X_train_encoded, y_train) # Predict and evaluate preds = model.predict(X_test_encoded) acc = accuracy_score(y_test, preds) print(f"Encoded training data:\n{X_train_encoded}\n") print(f"Predictions: {preds}") print(f"Accuracy: {acc:.2f}")
Target encoding can cause overfitting if not done carefully. Use techniques like cross-validation or smoothing.
Always fit the encoder only on training data, then transform test data to avoid data leakage.
Target encoding works best with categorical features that have a meaningful relationship with the target.
Target encoding converts categories into numbers using the average target value.
It helps models use categorical data without creating many new columns.
Be careful to avoid overfitting by fitting only on training data.