One-hot encoding changes categories into numbers that computers can understand easily. It helps models learn from data with labels like colors or types.
0
0
One-hot encoding in ML Python
Introduction
When you have a list of fruits like apple, banana, and orange and want to use them in a model.
When your data has categories like car brands and you need to turn them into numbers.
When preparing survey answers like 'yes', 'no', or 'maybe' for machine learning.
When you want to avoid giving wrong importance to categories by using plain numbers.
When feeding categorical data into models that only understand numbers, like neural networks.
Syntax
ML Python
from sklearn.preprocessing import OneHotEncoder encoder = OneHotEncoder(sparse_output=False) encoded_data = encoder.fit_transform(data)
fit_transform learns the categories and converts data in one step.
Setting sparse_output=False returns a normal array instead of a sparse matrix.
Examples
This example converts three colors into one-hot encoded numbers.
ML Python
from sklearn.preprocessing import OneHotEncoder encoder = OneHotEncoder(sparse_output=False) data = [['red'], ['green'], ['blue']] encoded = encoder.fit_transform(data) print(encoded)
This example drops the first category to avoid extra columns.
ML Python
from sklearn.preprocessing import OneHotEncoder encoder = OneHotEncoder(sparse_output=False, drop='first') data = [['cat'], ['dog'], ['cat']] encoded = encoder.fit_transform(data) print(encoded)
Sample Model
This program shows how one-hot encoding changes fruit names into numbers. It prints the original list, the encoded array, and the categories found.
ML Python
from sklearn.preprocessing import OneHotEncoder # Sample data with three categories data = [['apple'], ['banana'], ['apple'], ['orange']] # Create encoder encoder = OneHotEncoder(sparse_output=False) # Fit and transform data encoded_data = encoder.fit_transform(data) # Show original and encoded data print('Original data:', data) print('Encoded data:') print(encoded_data) # Show categories learned print('Categories:', encoder.categories_)
OutputSuccess
Important Notes
One-hot encoding creates a new column for each category with 1 or 0 to show presence.
It works best for categories without order, like colors or names.
Too many categories can make data very large, so use carefully.
Summary
One-hot encoding turns categories into easy-to-use numbers for models.
It creates a separate column for each category with 1 or 0 values.
Use it when your data has labels that are not numbers.