Challenge - 5 Problems
Categorical Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of converting a column to categorical
What is the output of this code snippet that converts a DataFrame column to categorical and then prints the categories?
Pandas
import pandas as pd df = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue', 'red']}) df['color'] = df['color'].astype('category') print(df['color'].cat.categories.tolist())
Attempts:
2 left
💡 Hint
Categories are sorted by default in pandas categorical dtype.
✗ Incorrect
When converting a column to categorical, pandas sorts the unique values alphabetically by default. So the categories are ['blue', 'green', 'red'].
❓ data_output
intermediate1:30remaining
Number of categories after conversion
After converting the 'grade' column to categorical, how many categories does it have?
Pandas
import pandas as pd df = pd.DataFrame({'grade': ['A', 'B', 'A', 'C', 'B', 'A']}) df['grade'] = df['grade'].astype('category') print(len(df['grade'].cat.categories))
Attempts:
2 left
💡 Hint
Count unique values in the column before conversion.
✗ Incorrect
The unique grades are 'A', 'B', and 'C', so there are 3 categories after conversion.
🔧 Debug
advanced2:00remaining
Identify the error in categorical conversion
What error does this code raise when trying to convert a column to categorical with specified categories?
Pandas
import pandas as pd df = pd.DataFrame({'size': ['S', 'M', 'L', 'XL']}) df['size'] = pd.Categorical(df['size'], categories=['S', 'M', 'L'])
Attempts:
2 left
💡 Hint
Check how pandas handles values not in specified categories.
✗ Incorrect
Pandas converts values not in the specified categories to NaN without raising an error.
🚀 Application
advanced2:30remaining
Effect of ordered categorical on sorting
Given this DataFrame with an ordered categorical column, what is the output after sorting by 'priority'?
Pandas
import pandas as pd df = pd.DataFrame({'task': ['task1', 'task2', 'task3'], 'priority': ['high', 'low', 'medium']}) priority_type = pd.CategoricalDtype(categories=['low', 'medium', 'high'], ordered=True) df['priority'] = df['priority'].astype(priority_type) sorted_df = df.sort_values('priority') print(sorted_df['task'].tolist())
Attempts:
2 left
💡 Hint
Ordered categorical sorts according to category order, not alphabetically.
✗ Incorrect
The order is low < medium < high, so sorting by 'priority' puts 'low' first (task2), then 'medium' (task3), then 'high' (task1).
🧠 Conceptual
expert3:00remaining
Memory usage difference with categorical dtype
Which statement best describes the memory usage difference when converting a large text column to categorical dtype in pandas?
Attempts:
2 left
💡 Hint
Think about how categorical data is stored compared to strings.
✗ Incorrect
Categorical dtype stores data as integer codes with a separate category list, reducing memory compared to storing repeated strings.