How to Use OrdinalEncoder in sklearn with Python
Use
OrdinalEncoder from sklearn.preprocessing to convert categorical data into ordinal integers. Fit the encoder on your data with fit() or fit_transform(), then transform new data with transform().Syntax
The basic syntax to use OrdinalEncoder is:
OrdinalEncoder(): Creates the encoder object.fit(X): Learns the categories from dataX.transform(X): Converts categories inXto ordinal numbers.fit_transform(X): Fits and transformsXin one step.
python
from sklearn.preprocessing import OrdinalEncoder encoder = OrdinalEncoder() encoder.fit(X) # Learn categories from X X_encoded = encoder.transform(X) # Transform X to ordinal numbers # Or combine fit and transform X_encoded = encoder.fit_transform(X)
Example
This example shows how to encode a list of categorical features into ordinal numbers using OrdinalEncoder. It fits the encoder on the data and then transforms it.
python
from sklearn.preprocessing import OrdinalEncoder import numpy as np # Sample categorical data with two features X = np.array([ ['red', 'S'], ['green', 'M'], ['blue', 'L'], ['green', 'XL'] ]) encoder = OrdinalEncoder() X_encoded = encoder.fit_transform(X) print("Original data:\n", X) print("Encoded data:\n", X_encoded)
Output
Original data:
[['red' 'S']
['green' 'M']
['blue' 'L']
['green' 'XL']]
Encoded data:
[[2. 0.]
[1. 1.]
[0. 2.]
[1. 3.]]
Common Pitfalls
1. Not fitting before transforming: You must call fit() or fit_transform() before transform(), or you get an error.
2. New categories in transform data: If new categories appear in data passed to transform() that were not seen during fit(), it will raise an error.
3. Data shape: Input must be 2D array-like, even if you have only one feature.
python
from sklearn.preprocessing import OrdinalEncoder import numpy as np encoder = OrdinalEncoder() X_train = np.array([['red'], ['green'], ['blue']]) X_test = np.array([['yellow']]) # New category not seen in training # Wrong: transform without fit # encoder.transform(X_train) # Raises error # Correct usage encoder.fit(X_train) # This will raise error because 'yellow' was not seen during fit # encoder.transform(X_test) # To handle unknown categories, use handle_unknown='use_encoded_value' and unknown_value encoder = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1) encoder.fit(X_train) X_test_encoded = encoder.transform(X_test) print(X_test_encoded) # Output: [[-1.]]
Output
[[-1.]]
Quick Reference
| Parameter | Description | Default |
|---|---|---|
| categories | Specifies categories per feature or 'auto' to infer | 'auto' |
| dtype | Data type of output encoded values | np.float64 |
| handle_unknown | How to handle unknown categories during transform ('error' or 'use_encoded_value') | 'error' |
| unknown_value | Value to use for unknown categories if handle_unknown='use_encoded_value' | None |
Key Takeaways
Always fit OrdinalEncoder on training data before transforming.
OrdinalEncoder converts categorical features into integer codes starting at 0.
Handle unknown categories with handle_unknown='use_encoded_value' to avoid errors.
Input data must be 2D array-like, even for single feature columns.
Use fit_transform() to combine fitting and transforming in one step.