What is the output of this code snippet?
import pandas as pd cats = pd.Categorical(['low', 'medium', 'high', 'medium', 'low'], categories=['low', 'medium', 'high'], ordered=True) s = pd.Series(cats) s_sorted = s.sort_values() print(s_sorted.tolist())
import pandas as pd cats = pd.Categorical(['low', 'medium', 'high', 'medium', 'low'], categories=['low', 'medium', 'high'], ordered=True) s = pd.Series(cats) s_sorted = s.sort_values() print(s_sorted.tolist())
Remember that ordered categories sort according to the order defined in categories.
The series is sorted by the order defined in the categorical: 'low' < 'medium' < 'high'. So the sorted list places all 'low' first, then 'medium', then 'high'.
Given this ordered categorical series, how many elements are less than 'medium'?
import pandas as pd cats = pd.Categorical(['low', 'medium', 'high', 'medium', 'low'], categories=['low', 'medium', 'high'], ordered=True) s = pd.Series(cats) count = (s < 'medium').sum() print(count)
import pandas as pd cats = pd.Categorical(['low', 'medium', 'high', 'medium', 'low'], categories=['low', 'medium', 'high'], ordered=True) s = pd.Series(cats) count = (s < 'medium').sum() print(count)
Check which values are strictly less than 'medium' in the order.
Only 'low' is less than 'medium'. There are two 'low' values, so the count is 2.
What error does this code raise?
import pandas as pd cats = pd.Categorical(['low', 'medium', 'high', 'medium', 'low'], categories=['low', 'medium'], ordered=True)
import pandas as pd cats = pd.Categorical(['low', 'medium', 'high', 'medium', 'low'], categories=['low', 'medium'], ordered=True)
Check if all values in the data are included in the categories list.
The data contains 'high' but the categories list does not include 'high'. This causes a ValueError indicating 'high' is not in categories.
Which option correctly creates an ordered categorical series and checks if each value is greater than or equal to 'medium'?
Ordered categories require the categories list to be in the correct order and ordered=True.
Option C correctly defines categories in ascending order with ordered=True. It then compares each value to 'medium' producing a boolean list. Option C misses categories argument, so order is inferred but may not be correct. Option C reverses category order, so comparison logic is wrong. Option C lacks ordered=True, so comparison operators raise an error.
Which is the best reason to use ordered categories in a dataset?
Think about what ordered categories add beyond normal categories.
Ordered categories allow comparisons like <, >, and sorting based on a logical order defined by the user. This is useful for data like ratings or sizes. While ordered categories do save memory (A), that is true for all categorical types, not just ordered ones. Filling missing values (C) and converting to numerical (B) are separate processes.