How to Use Label Encoding with sklearn in Python
Use
LabelEncoder from sklearn.preprocessing to convert categorical labels into numeric values. Fit the encoder with fit() or fit_transform() on your label data, then transform labels to numbers with transform().Syntax
The main steps to use LabelEncoder are:
from sklearn.preprocessing import LabelEncoder: Import the encoder.encoder = LabelEncoder(): Create an encoder object.encoder.fit(labels): Learn the mapping from labels to numbers.encoded_labels = encoder.transform(labels): Convert labels to numbers.- Or use
encoded_labels = encoder.fit_transform(labels)to fit and transform in one step.
python
from sklearn.preprocessing import LabelEncoder encoder = LabelEncoder() encoder.fit(labels) # Learn label mapping encoded_labels = encoder.transform(labels) # Convert labels to numbers # Or combine: encoded_labels = encoder.fit_transform(labels)
Example
This example shows how to encode a list of fruit names into numbers using LabelEncoder. It fits the encoder and transforms the labels, then prints the original and encoded labels.
python
from sklearn.preprocessing import LabelEncoder labels = ['apple', 'banana', 'apple', 'orange', 'banana', 'orange'] encoder = LabelEncoder() encoded_labels = encoder.fit_transform(labels) print('Original labels:', labels) print('Encoded labels:', encoded_labels.tolist())
Output
Original labels: ['apple', 'banana', 'apple', 'orange', 'banana', 'orange']
Encoded labels: [0, 1, 0, 2, 1, 2]
Common Pitfalls
Common mistakes when using LabelEncoder include:
- Trying to encode features with multiple columns instead of just labels.
- Using
fit_transformon training data but onlytransformon test data without fitting first. - Assuming encoded numbers have order or magnitude meaning (they do not).
Always fit the encoder on training labels and then transform test labels to avoid errors.
python
from sklearn.preprocessing import LabelEncoder # Wrong: fitting separately on train and test labels train_labels = ['cat', 'dog', 'cat'] test_labels = ['dog', 'cat'] encoder = LabelEncoder() train_encoded = encoder.fit_transform(train_labels) # This will cause issues if test labels have unseen classes # test_encoded = encoder.fit_transform(test_labels) # Wrong # Right: fit on train, transform on test encoder = LabelEncoder() encoder.fit(train_labels) test_encoded = encoder.transform(test_labels)
Quick Reference
| Method | Description |
|---|---|
| fit(labels) | Learn label to number mapping from labels |
| transform(labels) | Convert labels to numbers using learned mapping |
| fit_transform(labels) | Fit and transform labels in one step |
| inverse_transform(numbers) | Convert numbers back to original labels |
Key Takeaways
Use sklearn's LabelEncoder to convert categorical labels to numeric values easily.
Always fit the encoder on training labels before transforming test labels.
Label encoding assigns arbitrary numbers without implying order or magnitude.
Use fit_transform() for quick encoding when training and transform() for new data.
Avoid encoding multi-column features with LabelEncoder; it is for single label arrays.