How to label encode column pandas

PandasHow-ToBeginner · 3 min read

How to Label Encode a Column in Pandas Easily

To label encode a column in pandas, use sklearn.preprocessing.LabelEncoder to convert categorical text data into numbers. Fit the encoder on the column and transform it, then assign the result back to the DataFrame column.

📐

Syntax

Use LabelEncoder from sklearn.preprocessing to convert text labels into numbers. The main steps are:

encoder = LabelEncoder(): Create the encoder object.
encoder.fit(column): Learn the unique labels from the column.
encoded = encoder.transform(column): Convert labels to numbers.
Assign the encoded values back to the DataFrame column.

python

from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
encoder.fit(df['column_name'])
encoded = encoder.transform(df['column_name'])
df['column_name'] = encoded

💻

Example

This example shows how to label encode a column named color in a pandas DataFrame. The text colors are converted to numbers.

python

import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Sample data
data = {'color': ['red', 'blue', 'green', 'blue', 'red']}
df = pd.DataFrame(data)

# Create encoder
encoder = LabelEncoder()

# Fit and transform the 'color' column
df['color_encoded'] = encoder.fit_transform(df['color'])

print(df)

Output

color color_encoded 0 red 2 1 blue 0 2 green 1 3 blue 0 4 red 2

⚠️

Common Pitfalls

Common mistakes when label encoding in pandas include:

Trying to encode the column without importing LabelEncoder.
Not fitting the encoder before transforming, which causes errors.
Overwriting the original column without saving encoded values separately, losing original data.
Using label encoding on columns with unseen categories in test data, which causes errors.

Always fit the encoder on training data and transform test data carefully.

python

import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Wrong way: transform without fit
# encoder = LabelEncoder()
# encoded = encoder.transform(df['color'])  # This will raise an error

# Right way:
encoder = LabelEncoder()
encoder.fit(df['color'])
encoded = encoder.transform(df['color'])
df['color_encoded'] = encoded

📊

Quick Reference

Step	Code	Description
1	from sklearn.preprocessing import LabelEncoder	Import the encoder class
2	encoder = LabelEncoder()	Create encoder object
3	encoder.fit(df['column'])	Learn unique labels from column
4	df['encoded'] = encoder.transform(df['column'])	Convert labels to numbers and save

✅

Key Takeaways

Use sklearn's LabelEncoder to convert categorical text to numbers in pandas.

Always fit the encoder before transforming the data.

Keep original data if you want to preserve text labels.

Label encoding is not suitable for columns with unseen categories in new data without retraining.

Assign encoded values to a new column to avoid losing original data.