0
0
Data Analysis Pythondata~30 mins

Categorical data type optimization in Data Analysis Python - Mini Project: Build & Apply

Choose your learning style9 modes available
Categorical data type optimization
📖 Scenario: You work in a company that collects customer feedback. The feedback data includes customer names and their satisfaction levels. The satisfaction levels are repeated many times and take only a few unique values.Storing this data efficiently can save memory and speed up analysis.
🎯 Goal: You will create a data structure to hold customer feedback, then convert the satisfaction levels to a categorical data type to optimize memory usage.
📋 What You'll Learn
Create a pandas DataFrame with customer names and satisfaction levels
Create a variable to hold the list of satisfaction levels
Convert the satisfaction levels column to a categorical data type
Print the DataFrame info to show memory usage
💡 Why This Matters
🌍 Real World
Categorical data types are used in data science to save memory and speed up analysis when working with repeated text values, such as survey responses or product categories.
💼 Career
Data analysts and data scientists often optimize data storage using categorical types to handle large datasets efficiently and improve performance.
Progress0 / 4 steps
1
Create the initial DataFrame
Create a pandas DataFrame called feedback with two columns: 'customer' and 'satisfaction'. Use these exact values for 'customer': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Helen'] and for 'satisfaction': ['Good', 'Bad', 'Good', 'Excellent', 'Bad', 'Good', 'Excellent', 'Bad'].
Data Analysis Python
Hint

Use pd.DataFrame with a dictionary containing the two lists for columns.

2
Create a list of satisfaction levels
Create a variable called satisfaction_levels and assign it the list ['Bad', 'Good', 'Excellent'] representing the unique satisfaction categories.
Data Analysis Python
Hint

Just assign the list of unique satisfaction values to the variable satisfaction_levels.

3
Convert satisfaction to categorical type
Convert the 'satisfaction' column in the feedback DataFrame to a categorical data type using the satisfaction_levels list as categories. Assign the result back to feedback['satisfaction'].
Data Analysis Python
Hint

Use pd.Categorical() with the categories parameter set to satisfaction_levels.

4
Print DataFrame info to show memory usage
Print the information of the feedback DataFrame using feedback.info() to display the memory usage and data types.
Data Analysis Python
Hint

Use feedback.info() to print the DataFrame details.