Challenge - 5 Problems
Categorical Memory Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Memory usage difference between object and categorical
Given a pandas DataFrame column with repeated string values, what is the approximate memory usage difference when converting it from object dtype to categorical dtype?
Pandas
import pandas as pd col = pd.Series(['apple', 'banana', 'apple', 'banana', 'apple'] * 1000) mem_obj = col.memory_usage(deep=True) mem_cat = col.astype('category').memory_usage(deep=True) print(mem_obj, mem_cat)
Attempts:
2 left
💡 Hint
Think about how categorical stores repeated strings once, while object stores each string separately.
✗ Incorrect
Categorical dtype stores unique values once and uses integer codes for each entry, saving memory compared to object dtype which stores each string separately.
❓ data_output
intermediate2:00remaining
Memory usage after converting multiple columns to categorical
You have a DataFrame with 3 columns of repeated string categories. After converting all 3 columns to categorical dtype, what is the expected effect on total memory usage?
Pandas
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': ['cat', 'dog', 'cat', 'dog'] * 1000, 'B': ['red', 'blue', 'red', 'blue'] * 1000, 'C': ['small', 'large', 'small', 'large'] * 1000 }) mem_before = df.memory_usage(deep=True).sum() df_cat = df.astype('category') mem_after = df_cat.memory_usage(deep=True).sum() print(mem_before, mem_after)
Attempts:
2 left
💡 Hint
Each categorical column stores unique values once, reducing memory.
✗ Incorrect
Converting multiple repeated string columns to categorical reduces memory usage significantly because unique values are stored once and data is stored as integer codes.
🔧 Debug
advanced2:00remaining
Why does this categorical conversion not reduce memory?
Consider this code snippet:
import pandas as pd
col = pd.Series(['unique' + str(i) for i in range(10000)])
col_cat = col.astype('category')
print(col.memory_usage(deep=true), col_cat.memory_usage(deep=true))
Why does converting to categorical not reduce memory usage here?
Pandas
import pandas as pd col = pd.Series(['unique' + str(i) for i in range(10000)]) col_cat = col.astype('category') print(col.memory_usage(deep=True), col_cat.memory_usage(deep=True))
Attempts:
2 left
💡 Hint
Think about how many unique values there are and what categorical stores.
✗ Incorrect
When all values are unique, categorical stores all unique strings plus integer codes, so memory usage can be higher than object dtype which stores strings once each.
🧠 Conceptual
advanced1:30remaining
Understanding categorical dtype internals
Which statement best describes how pandas stores data internally when using categorical dtype?
Attempts:
2 left
💡 Hint
Think about how categorical reduces memory by avoiding repeated strings.
✗ Incorrect
Categorical dtype stores data as integer codes referencing a separate array of unique categories, avoiding repeated storage of strings.
🚀 Application
expert3:00remaining
Optimizing memory for a large DataFrame with mixed data types
You have a large DataFrame with 1 million rows and these columns:
- 'city' (string, 100 unique values)
- 'temperature' (float)
- 'date' (datetime)
- 'status' (string, 3 unique values)
Which memory optimization strategy will save the most memory without losing data?
Attempts:
2 left
💡 Hint
Only columns with repeated string values benefit from categorical dtype.
✗ Incorrect
Converting 'city' and 'status' to categorical saves memory because they have few unique values. 'temperature' should stay float for numeric operations. 'date' is best kept as datetime for time operations.