Memory savings with categoricals
📖 Scenario: You work in a company that collects survey data about favorite fruits from many people. The data is stored in a table with many repeated fruit names. You want to save computer memory by using a special data type called categorical.
🎯 Goal: You will create a pandas DataFrame with fruit names, convert the fruit column to a categorical type, and compare memory usage before and after. This helps you understand how categoricals save memory.
📋 What You'll Learn
Create a pandas DataFrame with a column named
fruit containing repeated fruit names.Create a variable called
memory_before to store the memory usage of the fruit column before conversion.Convert the
fruit column to categorical type and store it back in the DataFrame.Create a variable called
memory_after to store the memory usage of the fruit column after conversion.Print the values of
memory_before and memory_after.💡 Why This Matters
🌍 Real World
In real data projects, large datasets often have repeated text values. Using categorical data types saves memory and speeds up analysis.
💼 Career
Data scientists and analysts use categoricals to optimize memory and performance when working with big data in pandas.
Progress0 / 4 steps