0
0
Pandasdata~20 mins

Converting to categorical in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Categorical Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of converting a column to categorical
What is the output of this code snippet that converts a DataFrame column to categorical and then prints the categories?
Pandas
import pandas as pd

df = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue', 'red']})
df['color'] = df['color'].astype('category')
print(df['color'].cat.categories.tolist())
A['blue', 'red', 'green']
B['red', 'blue', 'green']
C['blue', 'green', 'red']
D['red', 'green', 'blue']
Attempts:
2 left
💡 Hint
Categories are sorted by default in pandas categorical dtype.
data_output
intermediate
1:30remaining
Number of categories after conversion
After converting the 'grade' column to categorical, how many categories does it have?
Pandas
import pandas as pd

df = pd.DataFrame({'grade': ['A', 'B', 'A', 'C', 'B', 'A']})
df['grade'] = df['grade'].astype('category')
print(len(df['grade'].cat.categories))
A3
B6
C4
D2
Attempts:
2 left
💡 Hint
Count unique values in the column before conversion.
🔧 Debug
advanced
2:00remaining
Identify the error in categorical conversion
What error does this code raise when trying to convert a column to categorical with specified categories?
Pandas
import pandas as pd

df = pd.DataFrame({'size': ['S', 'M', 'L', 'XL']})
df['size'] = pd.Categorical(df['size'], categories=['S', 'M', 'L'])
ATypeError: categories must be a list
BNo error, conversion succeeds with 'XL' as NaN category
CValueError: 'XL' not in categories
DKeyError: 'size'
Attempts:
2 left
💡 Hint
Check how pandas handles values not in specified categories.
🚀 Application
advanced
2:30remaining
Effect of ordered categorical on sorting
Given this DataFrame with an ordered categorical column, what is the output after sorting by 'priority'?
Pandas
import pandas as pd

df = pd.DataFrame({'task': ['task1', 'task2', 'task3'], 'priority': ['high', 'low', 'medium']})
priority_type = pd.CategoricalDtype(categories=['low', 'medium', 'high'], ordered=True)
df['priority'] = df['priority'].astype(priority_type)
sorted_df = df.sort_values('priority')
print(sorted_df['task'].tolist())
A['task1', 'task3', 'task2']
B['task2', 'task1', 'task3']
C['task3', 'task2', 'task1']
D['task2', 'task3', 'task1']
Attempts:
2 left
💡 Hint
Ordered categorical sorts according to category order, not alphabetically.
🧠 Conceptual
expert
3:00remaining
Memory usage difference with categorical dtype
Which statement best describes the memory usage difference when converting a large text column to categorical dtype in pandas?
AMemory usage decreases because pandas stores categories as integers internally
BMemory usage increases because categories add overhead for mapping
CMemory usage stays the same because data is unchanged
DMemory usage decreases only if categories are sorted alphabetically
Attempts:
2 left
💡 Hint
Think about how categorical data is stored compared to strings.