0
0
Data Analysis Pythondata~5 mins

One-hot encoding in Data Analysis Python

Choose your learning style9 modes available
Introduction

One-hot encoding turns categories into numbers that computers can understand easily. It helps us use text data in math and machine learning.

You have a list of colors like red, blue, green and want to use them in a math model.
You want to convert yes/no answers into numbers for analysis.
You have different types of fruits and want to compare them using a computer.
You want to prepare survey answers with categories for a chart or prediction.
You need to change city names into a format a computer can work with.
Syntax
Data Analysis Python
import pandas as pd

# Using pandas get_dummies function
one_hot = pd.get_dummies(data['column_name'])
Use pd.get_dummies() to convert categorical columns into one-hot encoded columns.
Each unique category becomes a new column with 1 or 0 values.
Examples
This example converts a 'Color' column with three colors into three columns with 1s and 0s.
Data Analysis Python
import pandas as pd

data = pd.DataFrame({'Color': ['Red', 'Blue', 'Green']})
one_hot = pd.get_dummies(data['Color'])
print(one_hot)
Here, 'Apple' and 'Banana' become separate columns showing presence with 1 or absence with 0.
Data Analysis Python
import pandas as pd

data = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple']})
one_hot = pd.get_dummies(data['Fruit'])
print(one_hot)
Sample Program

This program creates a list of pets and changes it into one-hot encoded columns. Each pet type gets its own column with 1 or 0.

Data Analysis Python
import pandas as pd

# Create a simple dataset
data = pd.DataFrame({'Pet': ['Dog', 'Cat', 'Bird', 'Dog']})

# Apply one-hot encoding
one_hot_encoded = pd.get_dummies(data['Pet'])

# Show the result
print(one_hot_encoded)
OutputSuccess
Important Notes

One-hot encoding increases the number of columns, so use it carefully with many categories.

It works best for categories without order, like colors or types.

Summary

One-hot encoding changes categories into numbers computers can use.

Each unique category becomes its own column with 1 or 0 values.

Use pd.get_dummies() in Python to do this easily.