0
0
Pandasdata~5 mins

Converting to categorical in Pandas

Choose your learning style9 modes available
Introduction

We convert data to categorical to save memory and make analysis faster. It also helps group similar values together.

When you have a column with a few repeated text values, like colors or categories.
When you want to reduce the size of a large dataset with repeated labels.
When preparing data for machine learning models that work better with categories.
When you want to order categories, like small, medium, large.
When you want to improve performance of filtering or grouping operations.
Syntax
Pandas
df['column'] = df['column'].astype('category')
Use astype('category') to convert a column to categorical type.
You can also specify the order of categories using pd.Categorical.
Examples
Convert the 'color' column to categorical type.
Pandas
df['color'] = df['color'].astype('category')
Create an ordered categorical column 'size' with specific order.
Pandas
df['size'] = pd.Categorical(df['size'], categories=['small', 'medium', 'large'], ordered=True)
Convert 'status' to categorical and print its categories.
Pandas
df['status'] = df['status'].astype('category')
print(df['status'].cat.categories)
Sample Program

This code creates a small table with colors and sizes. It changes 'color' to a category type and 'size' to an ordered category. Then it prints info about the data, the table itself, and the categories for each column.

Pandas
import pandas as pd

# Create a sample DataFrame
data = {'color': ['red', 'blue', 'green', 'blue', 'red', 'green', 'red'],
        'size': ['small', 'large', 'medium', 'medium', 'small', 'large', 'small']}
df = pd.DataFrame(data)

# Convert 'color' to categorical
df['color'] = df['color'].astype('category')

# Convert 'size' to ordered categorical
df['size'] = pd.Categorical(df['size'], categories=['small', 'medium', 'large'], ordered=True)

# Show DataFrame info to see memory usage and types
print(df.info())

# Show the DataFrame
print(df)

# Show categories of 'color'
print('Color categories:', df['color'].cat.categories)

# Show categories of 'size'
print('Size categories:', df['size'].cat.categories)
OutputSuccess
Important Notes

Categorical data uses less memory than strings.

Ordered categories allow comparisons like "small < medium".

Be careful: converting back to string requires astype(str).

Summary

Convert repeated text columns to categorical to save memory and speed up analysis.

Use astype('category') for simple conversion.

Use pd.Categorical to set category order.