0
0
PandasHow-ToBeginner · 3 min read

How to Use str.get_dummies in pandas for One-Hot Encoding

Use str.get_dummies() on a pandas Series containing strings to convert each unique value into separate one-hot encoded columns. This method splits strings by a delimiter and creates binary columns indicating the presence of each category.
📐

Syntax

The basic syntax of str.get_dummies() is:

  • Series.str.get_dummies(sep='delimiter')

Where:

  • Series is a pandas Series with string values.
  • sep is the delimiter string used to split each string into parts (default is |).

This method returns a DataFrame with one column per unique split value and 1/0 indicating presence.

python
Series.str.get_dummies(sep='delimiter')
💻

Example

This example shows how to use str.get_dummies() to convert a Series of strings with comma-separated values into one-hot encoded columns.

python
import pandas as pd

# Sample data: Series with comma-separated categories
s = pd.Series(['apple,banana', 'banana', 'apple,orange', 'banana,orange,apple'])

# Use str.get_dummies with comma as separator
one_hot = s.str.get_dummies(sep=',')

print(one_hot)
Output
apple banana orange 0 1 1 0 1 0 1 0 2 1 0 1 3 1 1 1
⚠️

Common Pitfalls

Common mistakes when using str.get_dummies() include:

  • Not specifying the correct sep delimiter, which leads to incorrect splitting.
  • Applying str.get_dummies() on non-string data or Series with missing values without cleaning first.
  • Expecting it to work on DataFrames directly instead of Series.

Always ensure your data is a string Series and the separator matches your data format.

python
import pandas as pd

# Wrong: no separator specified for comma-separated data
s = pd.Series(['cat,dog', 'dog', 'cat,bird'])
wrong = s.str.get_dummies()  # Default sep='|', so no split

# Right: specify sep=','
right = s.str.get_dummies(sep=',')

print('Wrong output:\n', wrong)
print('\nRight output:\n', right)
Output
Wrong output: cat,dog cat,bird dog 0 1 0 0 1 0 0 1 2 0 1 0 Right output: bird cat dog 0 0 1 1 1 0 0 1 2 1 1 0
📊

Quick Reference

Summary tips for using str.get_dummies():

  • Use on a pandas Series with string values.
  • Set sep to the delimiter used in your strings (e.g., ',', '|', ' ').
  • Returns a DataFrame with one-hot encoded columns for each unique split value.
  • Useful for converting multi-label text data into numeric format for analysis.

Key Takeaways

Use str.get_dummies on a pandas Series of strings to create one-hot encoded columns.
Always specify the correct separator with the sep parameter to split strings properly.
The method returns a DataFrame with binary columns for each unique category found.
Ensure your data is clean and string-typed before applying str.get_dummies.
It is ideal for multi-label categorical data stored as delimiter-separated strings.