0
0
Data Analysis Pythondata~20 mins

Categorical data type optimization in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Categorical Data Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of memory usage after converting to categorical

What is the output of the following Python code that optimizes a DataFrame column by converting it to a categorical type?

Data Analysis Python
import pandas as pd
import numpy as np

# Create a DataFrame with 100000 rows
np.random.seed(0)
data = pd.DataFrame({
    'color': np.random.choice(['red', 'green', 'blue'], size=100000)
})

# Memory usage before conversion
mem_before = data.memory_usage(deep=True).sum()

# Convert 'color' column to categorical
data['color'] = data['color'].astype('category')

# Memory usage after conversion
mem_after = data.memory_usage(deep=True).sum()

print(round(mem_before), round(mem_after))
A240000 240000
B240000 80000
C300000 300000
D80000 240000
Attempts:
2 left
💡 Hint

Think about how categorical data stores repeated values efficiently compared to strings.

data_output
intermediate
1:30remaining
Number of unique categories after optimization

Given a DataFrame with a 'city' column converted to categorical, what is the number of unique categories stored?

Data Analysis Python
import pandas as pd

cities = ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'New York', 'Chicago']
data = pd.DataFrame({'city': cities})
data['city'] = data['city'].astype('category')

num_categories = len(data['city'].cat.categories)
print(num_categories)
A4
B7
C6
D5
Attempts:
2 left
💡 Hint

Count unique city names ignoring duplicates.

🔧 Debug
advanced
2:00remaining
Identify the error in categorical conversion code

What error does the following code produce?

Data Analysis Python
import pandas as pd

values = ['apple', 'banana', 'apple', 'orange']
data = pd.DataFrame({'fruit': values})
data['fruit'] = data['fruit'].astype('category', categories=['apple', 'banana'])
ATypeError: astype() got an unexpected keyword argument 'categories'
BValueError: Cannot set categories with astype()
CKeyError: 'categories'
DNo error, runs successfully
Attempts:
2 left
💡 Hint

Check the correct way to specify categories when converting to categorical.

🚀 Application
advanced
1:30remaining
Choosing categorical dtype for large dataset

You have a dataset with a column 'status' containing 3 unique values repeated millions of times. Which approach optimizes memory best?

AKeep 'status' as object dtype (string)
BConvert 'status' to integer dtype without mapping
CConvert 'status' to category dtype
DConvert 'status' to float dtype
Attempts:
2 left
💡 Hint

Think about how categorical dtype stores repeated values efficiently.

🧠 Conceptual
expert
2:00remaining
Effect of categorical ordering on sorting performance

How does setting the ordered=True parameter in a categorical column affect sorting performance in pandas?

ASorting is faster because pandas uses the category order for comparisons
BSorting is slower because ordered categories require extra checks
CSorting raises an error if categories are ordered
DSorting performance is unchanged by ordered parameter
Attempts:
2 left
💡 Hint

Consider how ordered categories allow direct integer comparison during sorting.