Challenge - 5 Problems
Categorical Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of memory usage with categorical vs object type
Consider a pandas DataFrame with a column of repeated string values. What will be the approximate memory usage difference when the column is converted to categorical type compared to keeping it as object type?
Pandas
import pandas as pd import numpy as np size = 100000 values = ['apple', 'banana', 'cherry', 'date'] data = pd.DataFrame({'fruit': np.random.choice(values, size)}) mem_before = data.memory_usage(deep=True).sum() data['fruit'] = data['fruit'].astype('category') mem_after = data.memory_usage(deep=True).sum() print(mem_before, mem_after)
Attempts:
2 left
💡 Hint
Think about how categorical stores repeated values as codes instead of full strings.
✗ Incorrect
Categorical type stores repeated values as integer codes and a separate categories list, which reduces memory usage significantly compared to storing full strings repeatedly.
❓ data_output
intermediate1:30remaining
Number of unique categories after conversion
Given a pandas Series with repeated string values, what will be the number of categories after converting it to categorical type?
Pandas
import pandas as pd s = pd.Series(['red', 'blue', 'red', 'green', 'blue', 'blue']) cat = s.astype('category') print(len(cat.cat.categories))
Attempts:
2 left
💡 Hint
Count unique values in the original Series.
✗ Incorrect
Categorical categories are the unique values in the Series. Here, 'red', 'blue', and 'green' are the unique values, so 3 categories.
🔧 Debug
advanced2:00remaining
Why does this code raise a TypeError?
This code tries to assign a new value to a categorical column but raises a TypeError. Why?
Pandas
import pandas as pd s = pd.Series(['small', 'medium', 'large'], dtype='category') s[0] = 'extra large'
Attempts:
2 left
💡 Hint
Think about allowed values in categorical data.
✗ Incorrect
You cannot assign a value not in the categories to a categorical Series. It raises a TypeError.
🚀 Application
advanced2:00remaining
Choosing categorical type for performance
You have a DataFrame with a column 'city' containing 1 million rows but only 50 unique city names. Which benefit will you get by converting 'city' to categorical type?
Attempts:
2 left
💡 Hint
Think about how categorical stores repeated values and how groupby uses categories.
✗ Incorrect
Categorical reduces memory by storing repeated values as codes and speeds up groupby by working on integer codes instead of strings.
🧠 Conceptual
expert2:30remaining
Why does sorting a categorical column with ordered=True differ from unordered?
What is the main difference in sorting behavior between a pandas categorical column with ordered=True and one with ordered=False?
Attempts:
2 left
💡 Hint
Think about what ordered categories mean for comparison.
✗ Incorrect
Ordered categorical means categories have a defined order used in sorting. Unordered categorical sorts by the underlying codes.