0
0
MlopsHow-ToBeginner · 3 min read

How to Use LabelEncoder in sklearn with Python

Use LabelEncoder from sklearn.preprocessing to convert categorical labels into numeric values. Fit the encoder with fit() or fit_transform() on your label data, then transform labels to numbers with transform().
📐

Syntax

The basic syntax to use LabelEncoder is:

  • from sklearn.preprocessing import LabelEncoder: Import the class.
  • encoder = LabelEncoder(): Create an encoder object.
  • encoder.fit(labels): Learn the unique labels.
  • encoded_labels = encoder.transform(labels): Convert labels to numbers.
  • Or use encoded_labels = encoder.fit_transform(labels) to fit and transform in one step.
python
from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
encoder.fit(labels)  # Learn unique labels
encoded_labels = encoder.transform(labels)  # Convert labels to numbers

# Or combine:
encoded_labels = encoder.fit_transform(labels)
💻

Example

This example shows how to convert a list of fruit names into numeric labels using LabelEncoder. It fits the encoder and transforms the labels, then prints the original and encoded labels.

python
from sklearn.preprocessing import LabelEncoder

fruits = ['apple', 'banana', 'apple', 'orange', 'banana', 'orange']

encoder = LabelEncoder()
encoded_fruits = encoder.fit_transform(fruits)

print('Original labels:', fruits)
print('Encoded labels:', encoded_fruits)
print('Classes:', encoder.classes_)
Output
Original labels: ['apple', 'banana', 'apple', 'orange', 'banana', 'orange'] Encoded labels: [0 1 0 2 1 2] Classes: ['apple' 'banana' 'orange']
⚠️

Common Pitfalls

Common mistakes when using LabelEncoder:

  • Trying to encode features instead of labels. LabelEncoder is meant for target labels, not input features.
  • Using transform() before fit() causes errors because the encoder doesn't know the classes yet.
  • Encoding training and test labels separately can cause inconsistent mappings. Always fit on training labels and transform test labels.
  • LabelEncoder only works on 1D arrays or lists of labels, not on multi-dimensional data.
python
from sklearn.preprocessing import LabelEncoder

labels_train = ['cat', 'dog', 'cat']
labels_test = ['dog', 'cat']

encoder = LabelEncoder()
encoder.fit(labels_train)  # Correct: fit on training labels
encoded_train = encoder.transform(labels_train)
encoded_test = encoder.transform(labels_test)  # Use transform, not fit_transform

print(encoded_train)
print(encoded_test)
Output
[0 1 0] [1 0]
📊

Quick Reference

LabelEncoder Quick Tips:

  • Use for encoding target labels, not features.
  • Fit once on training labels, then transform all data.
  • Use fit_transform() to combine fit and transform.
  • Access original classes with encoder.classes_.
  • Encoded labels are integers starting from 0.

Key Takeaways

LabelEncoder converts categorical labels into numeric form for ML models.
Always fit the encoder on training labels before transforming.
Use fit_transform() to fit and encode in one step for convenience.
Do not use LabelEncoder on input features; it is for target labels only.
Access encoder.classes_ to see the original label order.