0
0
PandasHow-ToBeginner · 3 min read

How to One Hot Encode a Column in Pandas Easily

To one hot encode a column in pandas, use pd.get_dummies() on the column or DataFrame. This converts categorical values into separate binary columns representing each category.
📐

Syntax

The basic syntax to one hot encode a column in pandas is:

  • pd.get_dummies(data, columns=[column_name]): Converts specified columns into one hot encoded columns.
  • data: Your pandas DataFrame.
  • columns: List of column names to encode.
python
pd.get_dummies(data, columns=['column_name'])
💻

Example

This example shows how to one hot encode the 'color' column in a DataFrame.

python
import pandas as pd

data = pd.DataFrame({
    'color': ['red', 'blue', 'green', 'blue'],
    'value': [10, 20, 30, 40]
})

encoded_data = pd.get_dummies(data, columns=['color'])
print(encoded_data)
Output
value color_blue color_green color_red 0 10 0 0 1 1 20 1 0 0 2 30 0 1 0 3 40 1 0 0
⚠️

Common Pitfalls

Common mistakes include:

  • Not specifying the columns parameter and encoding the whole DataFrame unintentionally.
  • Forgetting to assign the result back to a variable or overwrite the original DataFrame.
  • Encoding numeric columns that should not be one hot encoded.
python
import pandas as pd

data = pd.DataFrame({
    'color': ['red', 'blue', 'green'],
    'value': [1, 2, 3]
})

# Wrong: encoding whole DataFrame
wrong = pd.get_dummies(data)
print(wrong)

# Right: encoding only 'color' column
right = pd.get_dummies(data, columns=['color'])
print(right)
Output
value color_blue color_green color_red 0 1 0 0 1 1 2 1 0 0 2 3 0 1 0 value color_blue color_green color_red 0 1 0 0 1 1 2 1 0 0 2 3 0 1 0
📊

Quick Reference

Tips for one hot encoding in pandas:

  • Use pd.get_dummies() to convert categorical columns.
  • Specify columns to avoid encoding unwanted data.
  • Assign the output to a new variable or overwrite the original DataFrame.
  • Use drop_first=True to avoid dummy variable trap if needed.

Key Takeaways

Use pd.get_dummies() to one hot encode categorical columns in pandas.
Always specify the columns parameter to encode only desired columns.
Assign the result to a variable to keep the encoded DataFrame.
Use drop_first=True to avoid redundant columns if needed.
Avoid encoding numeric columns that do not represent categories.