How to Use LabelEncoder in sklearn with Python
Use
LabelEncoder from sklearn.preprocessing to convert categorical labels into numeric values. Fit the encoder with fit() or fit_transform() on your label data, then transform labels to numbers with transform().Syntax
The basic syntax to use LabelEncoder is:
from sklearn.preprocessing import LabelEncoder: Import the class.encoder = LabelEncoder(): Create an encoder object.encoder.fit(labels): Learn the unique labels.encoded_labels = encoder.transform(labels): Convert labels to numbers.- Or use
encoded_labels = encoder.fit_transform(labels)to fit and transform in one step.
python
from sklearn.preprocessing import LabelEncoder encoder = LabelEncoder() encoder.fit(labels) # Learn unique labels encoded_labels = encoder.transform(labels) # Convert labels to numbers # Or combine: encoded_labels = encoder.fit_transform(labels)
Example
This example shows how to convert a list of fruit names into numeric labels using LabelEncoder. It fits the encoder and transforms the labels, then prints the original and encoded labels.
python
from sklearn.preprocessing import LabelEncoder fruits = ['apple', 'banana', 'apple', 'orange', 'banana', 'orange'] encoder = LabelEncoder() encoded_fruits = encoder.fit_transform(fruits) print('Original labels:', fruits) print('Encoded labels:', encoded_fruits) print('Classes:', encoder.classes_)
Output
Original labels: ['apple', 'banana', 'apple', 'orange', 'banana', 'orange']
Encoded labels: [0 1 0 2 1 2]
Classes: ['apple' 'banana' 'orange']
Common Pitfalls
Common mistakes when using LabelEncoder:
- Trying to encode features instead of labels. LabelEncoder is meant for target labels, not input features.
- Using
transform()beforefit()causes errors because the encoder doesn't know the classes yet. - Encoding training and test labels separately can cause inconsistent mappings. Always fit on training labels and transform test labels.
- LabelEncoder only works on 1D arrays or lists of labels, not on multi-dimensional data.
python
from sklearn.preprocessing import LabelEncoder labels_train = ['cat', 'dog', 'cat'] labels_test = ['dog', 'cat'] encoder = LabelEncoder() encoder.fit(labels_train) # Correct: fit on training labels encoded_train = encoder.transform(labels_train) encoded_test = encoder.transform(labels_test) # Use transform, not fit_transform print(encoded_train) print(encoded_test)
Output
[0 1 0]
[1 0]
Quick Reference
LabelEncoder Quick Tips:
- Use for encoding target labels, not features.
- Fit once on training labels, then transform all data.
- Use
fit_transform()to combine fit and transform. - Access original classes with
encoder.classes_. - Encoded labels are integers starting from 0.
Key Takeaways
LabelEncoder converts categorical labels into numeric form for ML models.
Always fit the encoder on training labels before transforming.
Use fit_transform() to fit and encode in one step for convenience.
Do not use LabelEncoder on input features; it is for target labels only.
Access encoder.classes_ to see the original label order.