0
0
Pandasdata~20 mins

Why categorical type matters in Pandas - Challenge Your Understanding

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Categorical Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of memory usage with categorical vs object type
Consider a pandas DataFrame with a column of repeated string values. What will be the approximate memory usage difference when the column is converted to categorical type compared to keeping it as object type?
Pandas
import pandas as pd
import numpy as np

size = 100000
values = ['apple', 'banana', 'cherry', 'date']
data = pd.DataFrame({'fruit': np.random.choice(values, size)})

mem_before = data.memory_usage(deep=True).sum()
data['fruit'] = data['fruit'].astype('category')
mem_after = data.memory_usage(deep=True).sum()

print(mem_before, mem_after)
ACode raises a TypeError because categorical conversion is invalid
Bmem_after is about the same as mem_before, no significant difference
Cmem_after is significantly smaller than mem_before, roughly 5-10 times less memory used
Dmem_after is larger than mem_before because categorical adds overhead
Attempts:
2 left
💡 Hint
Think about how categorical stores repeated values as codes instead of full strings.
data_output
intermediate
1:30remaining
Number of unique categories after conversion
Given a pandas Series with repeated string values, what will be the number of categories after converting it to categorical type?
Pandas
import pandas as pd
s = pd.Series(['red', 'blue', 'red', 'green', 'blue', 'blue'])
cat = s.astype('category')
print(len(cat.cat.categories))
A6
BRaises AttributeError
C1
D3
Attempts:
2 left
💡 Hint
Count unique values in the original Series.
🔧 Debug
advanced
2:00remaining
Why does this code raise a TypeError?
This code tries to assign a new value to a categorical column but raises a TypeError. Why?
Pandas
import pandas as pd
s = pd.Series(['small', 'medium', 'large'], dtype='category')
s[0] = 'extra large'
ABecause 'extra large' is not in the categories, assignment fails
BBecause categorical columns are immutable and cannot be changed
CBecause 'extra large' is a string and categorical only accepts integers
DBecause pandas Series do not support item assignment
Attempts:
2 left
💡 Hint
Think about allowed values in categorical data.
🚀 Application
advanced
2:00remaining
Choosing categorical type for performance
You have a DataFrame with a column 'city' containing 1 million rows but only 50 unique city names. Which benefit will you get by converting 'city' to categorical type?
ASlower filtering but faster sorting on 'city'
BReduced memory usage and faster groupby operations on 'city'
CNo benefit because 50 unique values is too many for categorical
DIncreased memory usage due to category overhead
Attempts:
2 left
💡 Hint
Think about how categorical stores repeated values and how groupby uses categories.
🧠 Conceptual
expert
2:30remaining
Why does sorting a categorical column with ordered=True differ from unordered?
What is the main difference in sorting behavior between a pandas categorical column with ordered=True and one with ordered=False?
AWith ordered=True, sorting respects the category order; with ordered=False, sorting is by category codes
BWith ordered=True, sorting is faster; with ordered=False, sorting raises an error
CWith ordered=True, sorting is random; with ordered=False, sorting respects category order
DThere is no difference; both sort alphabetically
Attempts:
2 left
💡 Hint
Think about what ordered categories mean for comparison.