Data Analysis Pythondata~10 mins

Encoding categorical variables in Data Analysis Python - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Encoding categorical variables

Start with categorical data

↓

Choose encoding method

↓

Label Encoding

↓

Map categories

↓

Replace original

↓

Encoded data ready

↓

Use in model

This flow shows how categorical data is transformed by choosing an encoding method, applying it, and preparing data for modeling.

Execution Sample

Data Analysis Python

import pandas as pd
from sklearn.preprocessing import LabelEncoder

data = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue']})
le = LabelEncoder()
data['color_encoded'] = le.fit_transform(data['color'])
print(data)

This code converts color names into numbers using label encoding.

Execution Table

Step	Action	Input Data	Encoding Result	Output Data
1	Start with data	{'color': ['red', 'blue', 'green', 'blue']}	None	{'color': ['red', 'blue', 'green', 'blue']}
2	Initialize LabelEncoder	None	Ready to encode	None
3	Fit and transform 'color'	['red', 'blue', 'green', 'blue']	Map: {'blue':0, 'green':1, 'red':2}	[2, 0, 1, 0]
4	Add encoded column	Original + encoded	Encoded column added	{'color': ['red', 'blue', 'green', 'blue'], 'color_encoded': [2, 0, 1, 0]}
5	Print final data	DataFrame with encoded	Shows encoded numbers	{'color': ['red', 'blue', 'green', 'blue'], 'color_encoded': [2, 0, 1, 0]}
6	End	Encoding complete	Data ready for model	Same as step 5

💡 Encoding finished after adding the encoded column to the data.

Variable Tracker

Variable	Start	After Step 3	After Step 4	Final
data['color']	['red', 'blue', 'green', 'blue']	Same	Same	Same
le	Uninitialized	LabelEncoder fitted	Same	Same
data['color_encoded']	Not present	[2, 0, 1, 0]	[2, 0, 1, 0]	[2, 0, 1, 0]

Key Moments - 3 Insights

Why does 'blue' get encoded as 0 and not 1 or 2?

Is the original 'color' column changed after encoding?

Can we use these encoded numbers directly in all models?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table at step 3. What number is assigned to 'green'?

Concept Snapshot

Encoding categorical variables:
- Convert text categories to numbers for models
- Label Encoding: assigns integer labels alphabetically
- One-Hot Encoding: creates binary columns per category
- Use LabelEncoder for ordinal data
- Use OneHotEncoder for nominal data
- Keep original data unless replacing

Full Transcript

Encoding categorical variables means changing text labels into numbers so computers can understand them. We start with data that has categories like colors. We pick a method: label encoding or one-hot encoding. Label encoding changes each category to a number based on alphabetical order. For example, 'blue' becomes 0, 'green' 1, and 'red' 2. We add these numbers as a new column next to the original. This helps models use the data. One-hot encoding makes new columns for each category with 0 or 1 to show presence. This is better when categories have no order. The code example shows label encoding step by step, adding a new column with numbers. Remember, the original data stays the same unless you replace it. This process prepares categorical data for machine learning.