0
0
MlopsHow-ToBeginner · 3 min read

How to Use OrdinalEncoder in sklearn with Python

Use OrdinalEncoder from sklearn.preprocessing to convert categorical data into ordinal integers. Fit the encoder on your data with fit() or fit_transform(), then transform new data with transform().
📐

Syntax

The basic syntax to use OrdinalEncoder is:

  • OrdinalEncoder(): Creates the encoder object.
  • fit(X): Learns the categories from data X.
  • transform(X): Converts categories in X to ordinal numbers.
  • fit_transform(X): Fits and transforms X in one step.
python
from sklearn.preprocessing import OrdinalEncoder

encoder = OrdinalEncoder()
encoder.fit(X)  # Learn categories from X
X_encoded = encoder.transform(X)  # Transform X to ordinal numbers

# Or combine fit and transform
X_encoded = encoder.fit_transform(X)
💻

Example

This example shows how to encode a list of categorical features into ordinal numbers using OrdinalEncoder. It fits the encoder on the data and then transforms it.

python
from sklearn.preprocessing import OrdinalEncoder
import numpy as np

# Sample categorical data with two features
X = np.array([
    ['red', 'S'],
    ['green', 'M'],
    ['blue', 'L'],
    ['green', 'XL']
])

encoder = OrdinalEncoder()
X_encoded = encoder.fit_transform(X)

print("Original data:\n", X)
print("Encoded data:\n", X_encoded)
Output
Original data: [['red' 'S'] ['green' 'M'] ['blue' 'L'] ['green' 'XL']] Encoded data: [[2. 0.] [1. 1.] [0. 2.] [1. 3.]]
⚠️

Common Pitfalls

1. Not fitting before transforming: You must call fit() or fit_transform() before transform(), or you get an error.

2. New categories in transform data: If new categories appear in data passed to transform() that were not seen during fit(), it will raise an error.

3. Data shape: Input must be 2D array-like, even if you have only one feature.

python
from sklearn.preprocessing import OrdinalEncoder
import numpy as np

encoder = OrdinalEncoder()
X_train = np.array([['red'], ['green'], ['blue']])
X_test = np.array([['yellow']])  # New category not seen in training

# Wrong: transform without fit
# encoder.transform(X_train)  # Raises error

# Correct usage
encoder.fit(X_train)

# This will raise error because 'yellow' was not seen during fit
# encoder.transform(X_test)

# To handle unknown categories, use handle_unknown='use_encoded_value' and unknown_value
encoder = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1)
encoder.fit(X_train)
X_test_encoded = encoder.transform(X_test)
print(X_test_encoded)  # Output: [[-1.]]
Output
[[-1.]]
📊

Quick Reference

ParameterDescriptionDefault
categoriesSpecifies categories per feature or 'auto' to infer'auto'
dtypeData type of output encoded valuesnp.float64
handle_unknownHow to handle unknown categories during transform ('error' or 'use_encoded_value')'error'
unknown_valueValue to use for unknown categories if handle_unknown='use_encoded_value'None

Key Takeaways

Always fit OrdinalEncoder on training data before transforming.
OrdinalEncoder converts categorical features into integer codes starting at 0.
Handle unknown categories with handle_unknown='use_encoded_value' to avoid errors.
Input data must be 2D array-like, even for single feature columns.
Use fit_transform() to combine fitting and transforming in one step.