0
0
Data Analysis Pythondata~5 mins

Encoding categorical variables in Data Analysis Python

Choose your learning style9 modes available
Introduction

We use encoding to turn words or categories into numbers so computers can understand and analyze them.

When you have a list of colors like red, blue, green and want to use them in a model.
When survey answers are categories like 'Yes', 'No', 'Maybe' and you want to analyze patterns.
When you have product types like 'Book', 'Toy', 'Clothing' and want to predict sales.
When you want to prepare data for machine learning models that only work with numbers.
Syntax
Data Analysis Python
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

# Label Encoding example
label_encoder = LabelEncoder()
encoded_labels = label_encoder.fit_transform(categorical_column)

# One Hot Encoding example
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

column_transformer = ColumnTransformer(
    [('one_hot_encoder', OneHotEncoder(), [column_index])],
    remainder='passthrough'
)
encoded_data = column_transformer.fit_transform(data)

LabelEncoder converts categories to numbers like 0, 1, 2.

OneHotEncoder creates new columns for each category with 0 or 1.

Examples
This turns each color into a number: red=2, blue=0, green=1.
Data Analysis Python
from sklearn.preprocessing import LabelEncoder

colors = ['red', 'blue', 'green', 'blue']
le = LabelEncoder()
encoded = le.fit_transform(colors)
print(encoded)
This creates columns for red, blue, green with 1 where the color matches.
Data Analysis Python
from sklearn.preprocessing import OneHotEncoder
import numpy as np

colors = np.array(['red', 'blue', 'green', 'blue']).reshape(-1, 1)
ohe = OneHotEncoder(sparse=False)
encoded = ohe.fit_transform(colors)
print(encoded)
Sample Program

This code shows how to convert fruit names into numbers and then into one-hot columns.

Data Analysis Python
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
import numpy as np

# Sample data
fruits = np.array(['apple', 'banana', 'apple', 'orange', 'banana']).reshape(-1, 1)

# Label Encoding
le = LabelEncoder()
labels = le.fit_transform(fruits.ravel())
print('Label Encoded:', labels)

# One Hot Encoding
ohe = OneHotEncoder(sparse=False)
one_hot = ohe.fit_transform(fruits)
print('One Hot Encoded:\n', one_hot)
OutputSuccess
Important Notes

Label encoding is simple but can confuse models if numbers imply order.

One hot encoding avoids order but can create many columns if categories are many.

Summary

Encoding turns categories into numbers so computers can use them.

Label encoding assigns a unique number to each category.

One hot encoding creates separate columns for each category with 0 or 1.