0
0
Pandasdata~5 mins

Adding and removing categories in Pandas

Choose your learning style9 modes available
Introduction

Categories help organize data with fixed possible values. Adding or removing categories lets you update these possible values as your data changes.

You get new types of data not in your original categories.
You want to clean up categories by removing unused or wrong ones.
You want to prepare data for analysis by defining all possible groups.
You need to fix category lists after merging data from different sources.
Syntax
Pandas
Series.cat.add_categories(new_categories)
Series.cat.remove_categories(categories_to_remove)

Use add_categories() to add new category values.

Use remove_categories() to remove unwanted category values.

Examples
This adds 'cherry' as a new category to the Series.
Pandas
import pandas as pd
s = pd.Series(['apple', 'banana'], dtype='category')
s = s.cat.add_categories(['cherry'])
print(s.cat.categories)
This removes 'banana' from the categories.
Pandas
import pandas as pd
s = pd.Series(['apple', 'banana', 'cherry'], dtype='category')
s = s.cat.remove_categories(['banana'])
print(s.cat.categories)
Sample Program

This program shows how to add and remove categories from a pandas Series. It starts with 'apple' and 'banana' categories, adds 'cherry', then removes 'banana'.

Pandas
import pandas as pd

# Create a Series with categories
fruits = pd.Series(['apple', 'banana', 'apple', 'banana'], dtype='category')
print('Original categories:', fruits.cat.categories)

# Add a new category 'cherry'
fruits = fruits.cat.add_categories(['cherry'])
print('After adding category:', fruits.cat.categories)

# Remove the category 'banana'
fruits = fruits.cat.remove_categories(['banana'])
print('After removing category:', fruits.cat.categories)

# Show the Series values
print('Series values:')
print(fruits)
OutputSuccess
Important Notes

Removing a category replaces its values with NaN in the Series.

Adding categories does not change existing data, only the list of allowed categories.

Summary

Categories define fixed possible values in data.

Use add_categories() to add new possible values.

Use remove_categories() to remove unwanted values, which replaces them with NaN.