0
0
PandasHow-ToBeginner · 3 min read

How to Label Encode a Column in Pandas Easily

To label encode a column in pandas, use sklearn.preprocessing.LabelEncoder to convert categorical text data into numbers. Fit the encoder on the column and transform it, then assign the result back to the DataFrame column.
📐

Syntax

Use LabelEncoder from sklearn.preprocessing to convert text labels into numbers. The main steps are:

  • encoder = LabelEncoder(): Create the encoder object.
  • encoder.fit(column): Learn the unique labels from the column.
  • encoded = encoder.transform(column): Convert labels to numbers.
  • Assign the encoded values back to the DataFrame column.
python
from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
encoder.fit(df['column_name'])
encoded = encoder.transform(df['column_name'])
df['column_name'] = encoded
💻

Example

This example shows how to label encode a column named color in a pandas DataFrame. The text colors are converted to numbers.

python
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Sample data
data = {'color': ['red', 'blue', 'green', 'blue', 'red']}
df = pd.DataFrame(data)

# Create encoder
encoder = LabelEncoder()

# Fit and transform the 'color' column
df['color_encoded'] = encoder.fit_transform(df['color'])

print(df)
Output
color color_encoded 0 red 2 1 blue 0 2 green 1 3 blue 0 4 red 2
⚠️

Common Pitfalls

Common mistakes when label encoding in pandas include:

  • Trying to encode the column without importing LabelEncoder.
  • Not fitting the encoder before transforming, which causes errors.
  • Overwriting the original column without saving encoded values separately, losing original data.
  • Using label encoding on columns with unseen categories in test data, which causes errors.

Always fit the encoder on training data and transform test data carefully.

python
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Wrong way: transform without fit
# encoder = LabelEncoder()
# encoded = encoder.transform(df['color'])  # This will raise an error

# Right way:
encoder = LabelEncoder()
encoder.fit(df['color'])
encoded = encoder.transform(df['color'])
df['color_encoded'] = encoded
📊

Quick Reference

StepCodeDescription
1from sklearn.preprocessing import LabelEncoderImport the encoder class
2encoder = LabelEncoder()Create encoder object
3encoder.fit(df['column'])Learn unique labels from column
4df['encoded'] = encoder.transform(df['column'])Convert labels to numbers and save

Key Takeaways

Use sklearn's LabelEncoder to convert categorical text to numbers in pandas.
Always fit the encoder before transforming the data.
Keep original data if you want to preserve text labels.
Label encoding is not suitable for columns with unseen categories in new data without retraining.
Assign encoded values to a new column to avoid losing original data.