How to Label Encode a Column in Pandas Easily
To label encode a column in pandas, use
sklearn.preprocessing.LabelEncoder to convert categorical text data into numbers. Fit the encoder on the column and transform it, then assign the result back to the DataFrame column.Syntax
Use LabelEncoder from sklearn.preprocessing to convert text labels into numbers. The main steps are:
encoder = LabelEncoder(): Create the encoder object.encoder.fit(column): Learn the unique labels from the column.encoded = encoder.transform(column): Convert labels to numbers.- Assign the encoded values back to the DataFrame column.
python
from sklearn.preprocessing import LabelEncoder encoder = LabelEncoder() encoder.fit(df['column_name']) encoded = encoder.transform(df['column_name']) df['column_name'] = encoded
Example
This example shows how to label encode a column named color in a pandas DataFrame. The text colors are converted to numbers.
python
import pandas as pd from sklearn.preprocessing import LabelEncoder # Sample data data = {'color': ['red', 'blue', 'green', 'blue', 'red']} df = pd.DataFrame(data) # Create encoder encoder = LabelEncoder() # Fit and transform the 'color' column df['color_encoded'] = encoder.fit_transform(df['color']) print(df)
Output
color color_encoded
0 red 2
1 blue 0
2 green 1
3 blue 0
4 red 2
Common Pitfalls
Common mistakes when label encoding in pandas include:
- Trying to encode the column without importing
LabelEncoder. - Not fitting the encoder before transforming, which causes errors.
- Overwriting the original column without saving encoded values separately, losing original data.
- Using label encoding on columns with unseen categories in test data, which causes errors.
Always fit the encoder on training data and transform test data carefully.
python
import pandas as pd from sklearn.preprocessing import LabelEncoder # Wrong way: transform without fit # encoder = LabelEncoder() # encoded = encoder.transform(df['color']) # This will raise an error # Right way: encoder = LabelEncoder() encoder.fit(df['color']) encoded = encoder.transform(df['color']) df['color_encoded'] = encoded
Quick Reference
| Step | Code | Description |
|---|---|---|
| 1 | from sklearn.preprocessing import LabelEncoder | Import the encoder class |
| 2 | encoder = LabelEncoder() | Create encoder object |
| 3 | encoder.fit(df['column']) | Learn unique labels from column |
| 4 | df['encoded'] = encoder.transform(df['column']) | Convert labels to numbers and save |
Key Takeaways
Use sklearn's LabelEncoder to convert categorical text to numbers in pandas.
Always fit the encoder before transforming the data.
Keep original data if you want to preserve text labels.
Label encoding is not suitable for columns with unseen categories in new data without retraining.
Assign encoded values to a new column to avoid losing original data.