Challenge - 5 Problems

🎖️

Categorical Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of converting a column to categorical

What is the output of this code snippet that converts a DataFrame column to categorical and then prints the categories?

Pandas

import pandas as pd

df = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue', 'red']})
df['color'] = df['color'].astype('category')
print(df['color'].cat.categories.tolist())

A['blue', 'red', 'green']

B['red', 'blue', 'green']

C['blue', 'green', 'red']

D['red', 'green', 'blue']

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Number of categories after conversion

After converting the 'grade' column to categorical, how many categories does it have?

Pandas

import pandas as pd

df = pd.DataFrame({'grade': ['A', 'B', 'A', 'C', 'B', 'A']})
df['grade'] = df['grade'].astype('category')
print(len(df['grade'].cat.categories))

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in categorical conversion

What error does this code raise when trying to convert a column to categorical with specified categories?

Pandas

import pandas as pd

df = pd.DataFrame({'size': ['S', 'M', 'L', 'XL']})
df['size'] = pd.Categorical(df['size'], categories=['S', 'M', 'L'])

ATypeError: categories must be a list

BNo error, conversion succeeds with 'XL' as NaN category

CValueError: 'XL' not in categories

DKeyError: 'size'

Attempts:

2 left

🚀 Application

advanced

2:30remaining

Effect of ordered categorical on sorting

Given this DataFrame with an ordered categorical column, what is the output after sorting by 'priority'?

Pandas

import pandas as pd

df = pd.DataFrame({'task': ['task1', 'task2', 'task3'], 'priority': ['high', 'low', 'medium']})
priority_type = pd.CategoricalDtype(categories=['low', 'medium', 'high'], ordered=True)
df['priority'] = df['priority'].astype(priority_type)
sorted_df = df.sort_values('priority')
print(sorted_df['task'].tolist())

A['task1', 'task3', 'task2']

B['task2', 'task1', 'task3']

C['task3', 'task2', 'task1']

D['task2', 'task3', 'task1']

Attempts:

2 left

🧠 Conceptual

expert

3:00remaining

Memory usage difference with categorical dtype

Which statement best describes the memory usage difference when converting a large text column to categorical dtype in pandas?

AMemory usage decreases because pandas stores categories as integers internally

BMemory usage increases because categories add overhead for mapping

CMemory usage stays the same because data is unchanged

DMemory usage decreases only if categories are sorted alphabetically

Attempts:

2 left