0
0
Pandasdata~30 mins

Memory savings with categoricals in Pandas - Mini Project: Build & Apply

Choose your learning style9 modes available
Memory savings with categoricals
📖 Scenario: You work in a company that collects survey data about favorite fruits from many people. The data is stored in a table with many repeated fruit names. You want to save computer memory by using a special data type called categorical.
🎯 Goal: You will create a pandas DataFrame with fruit names, convert the fruit column to a categorical type, and compare memory usage before and after. This helps you understand how categoricals save memory.
📋 What You'll Learn
Create a pandas DataFrame with a column named fruit containing repeated fruit names.
Create a variable called memory_before to store the memory usage of the fruit column before conversion.
Convert the fruit column to categorical type and store it back in the DataFrame.
Create a variable called memory_after to store the memory usage of the fruit column after conversion.
Print the values of memory_before and memory_after.
💡 Why This Matters
🌍 Real World
In real data projects, large datasets often have repeated text values. Using categorical data types saves memory and speeds up analysis.
💼 Career
Data scientists and analysts use categoricals to optimize memory and performance when working with big data in pandas.
Progress0 / 4 steps
1
Create the initial DataFrame
Import pandas as pd and create a DataFrame called df with one column named fruit containing these exact values in order: 'apple', 'banana', 'apple', 'orange', 'banana', 'apple'.
Pandas
Need a hint?

Use pd.DataFrame and pass a dictionary with key 'fruit' and the list of fruits as value.

2
Measure memory usage before conversion
Create a variable called memory_before and set it to the memory usage of the fruit column in df using the memory_usage(deep=True).sum() method.
Pandas
Need a hint?

Use df['fruit'].memory_usage(deep=True) to get memory in bytes.

3
Convert the fruit column to categorical
Convert the fruit column in df to categorical type using astype('category') and assign it back to df['fruit']. Then create a variable called memory_after and set it to the memory usage of the new categorical fruit column using memory_usage(deep=True).sum().
Pandas
Need a hint?

Use astype('category') to convert the column. Then measure memory usage again.

4
Print memory usage before and after
Print the values of memory_before and memory_after on separate lines using two print statements.
Pandas
Need a hint?

Use two print statements: print(memory_before) and print(memory_after).