0
0
ML Pythonml~5 mins

One-hot encoding in ML Python

Choose your learning style9 modes available
Introduction

One-hot encoding changes categories into numbers that computers can understand easily. It helps models learn from data with labels like colors or types.

When you have a list of fruits like apple, banana, and orange and want to use them in a model.
When your data has categories like car brands and you need to turn them into numbers.
When preparing survey answers like 'yes', 'no', or 'maybe' for machine learning.
When you want to avoid giving wrong importance to categories by using plain numbers.
When feeding categorical data into models that only understand numbers, like neural networks.
Syntax
ML Python
from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder(sparse_output=False)
encoded_data = encoder.fit_transform(data)

fit_transform learns the categories and converts data in one step.

Setting sparse_output=False returns a normal array instead of a sparse matrix.

Examples
This example converts three colors into one-hot encoded numbers.
ML Python
from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder(sparse_output=False)
data = [['red'], ['green'], ['blue']]
encoded = encoder.fit_transform(data)
print(encoded)
This example drops the first category to avoid extra columns.
ML Python
from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder(sparse_output=False, drop='first')
data = [['cat'], ['dog'], ['cat']]
encoded = encoder.fit_transform(data)
print(encoded)
Sample Model

This program shows how one-hot encoding changes fruit names into numbers. It prints the original list, the encoded array, and the categories found.

ML Python
from sklearn.preprocessing import OneHotEncoder

# Sample data with three categories
data = [['apple'], ['banana'], ['apple'], ['orange']]

# Create encoder
encoder = OneHotEncoder(sparse_output=False)

# Fit and transform data
encoded_data = encoder.fit_transform(data)

# Show original and encoded data
print('Original data:', data)
print('Encoded data:')
print(encoded_data)

# Show categories learned
print('Categories:', encoder.categories_)
OutputSuccess
Important Notes

One-hot encoding creates a new column for each category with 1 or 0 to show presence.

It works best for categories without order, like colors or names.

Too many categories can make data very large, so use carefully.

Summary

One-hot encoding turns categories into easy-to-use numbers for models.

It creates a separate column for each category with 1 or 0 values.

Use it when your data has labels that are not numbers.