0
0
Pandasdata~20 mins

Memory savings with categoricals in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Categorical Memory Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Memory usage difference between object and categorical
Given a pandas DataFrame column with repeated string values, what is the approximate memory usage difference when converting it from object dtype to categorical dtype?
Pandas
import pandas as pd

col = pd.Series(['apple', 'banana', 'apple', 'banana', 'apple'] * 1000)
mem_obj = col.memory_usage(deep=True)
mem_cat = col.astype('category').memory_usage(deep=True)
print(mem_obj, mem_cat)
Amem_cat uses zero memory
Bmem_obj is roughly 5 times larger than mem_cat
Cmem_cat is roughly 5 times larger than mem_obj
Dmem_obj and mem_cat have about the same memory usage
Attempts:
2 left
💡 Hint
Think about how categorical stores repeated strings once, while object stores each string separately.
data_output
intermediate
2:00remaining
Memory usage after converting multiple columns to categorical
You have a DataFrame with 3 columns of repeated string categories. After converting all 3 columns to categorical dtype, what is the expected effect on total memory usage?
Pandas
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': ['cat', 'dog', 'cat', 'dog'] * 1000,
    'B': ['red', 'blue', 'red', 'blue'] * 1000,
    'C': ['small', 'large', 'small', 'large'] * 1000
})
mem_before = df.memory_usage(deep=True).sum()
df_cat = df.astype('category')
mem_after = df_cat.memory_usage(deep=True).sum()
print(mem_before, mem_after)
ATotal memory usage decreases significantly after conversion
BTotal memory usage increases after conversion
CTotal memory usage stays the same
DTotal memory usage becomes zero
Attempts:
2 left
💡 Hint
Each categorical column stores unique values once, reducing memory.
🔧 Debug
advanced
2:00remaining
Why does this categorical conversion not reduce memory?
Consider this code snippet: import pandas as pd col = pd.Series(['unique' + str(i) for i in range(10000)]) col_cat = col.astype('category') print(col.memory_usage(deep=true), col_cat.memory_usage(deep=true)) Why does converting to categorical not reduce memory usage here?
Pandas
import pandas as pd
col = pd.Series(['unique' + str(i) for i in range(10000)])
col_cat = col.astype('category')
print(col.memory_usage(deep=True), col_cat.memory_usage(deep=True))
ABecause pandas does not support categorical for string data
BBecause categorical dtype always uses more memory than object dtype
CBecause all values are unique, categorical stores all strings plus codes, increasing memory
DBecause the code has a syntax error and does not run
Attempts:
2 left
💡 Hint
Think about how many unique values there are and what categorical stores.
🧠 Conceptual
advanced
1:30remaining
Understanding categorical dtype internals
Which statement best describes how pandas stores data internally when using categorical dtype?
AIt converts all strings to numeric floats
BIt stores all string values repeatedly as in object dtype
CIt compresses strings using zip compression internally
DIt stores an array of integer codes and a separate array of unique categories
Attempts:
2 left
💡 Hint
Think about how categorical reduces memory by avoiding repeated strings.
🚀 Application
expert
3:00remaining
Optimizing memory for a large DataFrame with mixed data types
You have a large DataFrame with 1 million rows and these columns: - 'city' (string, 100 unique values) - 'temperature' (float) - 'date' (datetime) - 'status' (string, 3 unique values) Which memory optimization strategy will save the most memory without losing data?
AConvert 'city' and 'status' to categorical dtype, keep 'temperature' as float, and 'date' as datetime
BConvert 'temperature' to categorical dtype, keep others as is
CConvert all columns to categorical dtype
DConvert 'date' to string dtype, keep others as is
Attempts:
2 left
💡 Hint
Only columns with repeated string values benefit from categorical dtype.