0
0
Pandasdata~15 mins

Why categorical type matters in Pandas - See It in Action

Choose your learning style9 modes available
Why categorical type matters
📖 Scenario: Imagine you work in a store that sells different types of fruits. You have a list of fruits sold each day, but some fruits appear many times. You want to organize this data efficiently and understand how many times each fruit was sold.
🎯 Goal: You will create a list of fruits sold, convert it to a categorical type in pandas, and then compare the memory usage and counts of each fruit. This will show why using categorical data is helpful.
📋 What You'll Learn
Create a pandas Series with fruit names
Create a categorical version of the Series
Compare memory usage of both Series
Count how many times each fruit appears
💡 Why This Matters
🌍 Real World
Stores, surveys, and many datasets have repeated categories like product types or answers. Using categorical types helps save memory and speeds up analysis.
💼 Career
Data analysts and scientists often work with large datasets. Knowing how to use categorical data types helps optimize performance and resource use.
Progress0 / 4 steps
1
Create a pandas Series with fruit names
Import pandas as pd and create a pandas Series called fruits with these exact values: ["apple", "banana", "apple", "orange", "banana", "apple"].
Pandas
Need a hint?

Use pd.Series() to create the Series from the list of fruits.

2
Create a categorical version of the Series
Create a new variable called fruits_cat by converting the fruits Series to categorical type using pd.Series.astype('category').
Pandas
Need a hint?

Use astype('category') on the fruits Series.

3
Compare memory usage of both Series
Create two variables: mem_fruits and mem_fruits_cat. Use the memory_usage(deep=True) method on fruits and fruits_cat respectively to get their memory usage.
Pandas
Need a hint?

Use memory_usage(deep=True) on both Series and save the results.

4
Print memory usage and counts of each fruit
Print the memory usage variables mem_fruits and mem_fruits_cat on separate lines. Then print the counts of each fruit in fruits_cat using the value_counts() method.
Pandas
Need a hint?

Use print() to show memory usage and value_counts() to count fruits.