Challenge - 5 Problems

🎖️

Categorical Data Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of memory usage after converting to categorical

What is the output of the following Python code that optimizes a DataFrame column by converting it to a categorical type?

Data Analysis Python

import pandas as pd
import numpy as np

# Create a DataFrame with 100000 rows
np.random.seed(0)
data = pd.DataFrame({
    'color': np.random.choice(['red', 'green', 'blue'], size=100000)
})

# Memory usage before conversion
mem_before = data.memory_usage(deep=True).sum()

# Convert 'color' column to categorical
data['color'] = data['color'].astype('category')

# Memory usage after conversion
mem_after = data.memory_usage(deep=True).sum()

print(round(mem_before), round(mem_after))

A240000 240000

B240000 80000

C300000 300000

D80000 240000

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Number of unique categories after optimization

Given a DataFrame with a 'city' column converted to categorical, what is the number of unique categories stored?

Data Analysis Python

import pandas as pd

cities = ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'New York', 'Chicago']
data = pd.DataFrame({'city': cities})
data['city'] = data['city'].astype('category')

num_categories = len(data['city'].cat.categories)
print(num_categories)

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in categorical conversion code

What error does the following code produce?

Data Analysis Python

import pandas as pd

values = ['apple', 'banana', 'apple', 'orange']
data = pd.DataFrame({'fruit': values})
data['fruit'] = data['fruit'].astype('category', categories=['apple', 'banana'])

ATypeError: astype() got an unexpected keyword argument 'categories'

BValueError: Cannot set categories with astype()

CKeyError: 'categories'

DNo error, runs successfully

Attempts:

2 left

🚀 Application

advanced

1:30remaining

Choosing categorical dtype for large dataset

You have a dataset with a column 'status' containing 3 unique values repeated millions of times. Which approach optimizes memory best?

AKeep 'status' as object dtype (string)

BConvert 'status' to integer dtype without mapping

CConvert 'status' to category dtype

DConvert 'status' to float dtype

Attempts:

2 left

🧠 Conceptual

expert

2:00remaining

Effect of categorical ordering on sorting performance

How does setting the ordered=True parameter in a categorical column affect sorting performance in pandas?

ASorting is faster because pandas uses the category order for comparisons

BSorting is slower because ordered categories require extra checks

CSorting raises an error if categories are ordered

DSorting performance is unchanged by ordered parameter

Attempts:

2 left