Challenge - 5 Problems

🎖️

Categorical Memory Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Memory usage difference between object and categorical

Given a pandas DataFrame column with repeated string values, what is the approximate memory usage difference when converting it from object dtype to categorical dtype?

Pandas

import pandas as pd

col = pd.Series(['apple', 'banana', 'apple', 'banana', 'apple'] * 1000)
mem_obj = col.memory_usage(deep=True)
mem_cat = col.astype('category').memory_usage(deep=True)
print(mem_obj, mem_cat)

Amem_cat uses zero memory

Bmem_obj is roughly 5 times larger than mem_cat

Cmem_cat is roughly 5 times larger than mem_obj

Dmem_obj and mem_cat have about the same memory usage

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Memory usage after converting multiple columns to categorical

You have a DataFrame with 3 columns of repeated string categories. After converting all 3 columns to categorical dtype, what is the expected effect on total memory usage?

Pandas

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': ['cat', 'dog', 'cat', 'dog'] * 1000,
    'B': ['red', 'blue', 'red', 'blue'] * 1000,
    'C': ['small', 'large', 'small', 'large'] * 1000
})
mem_before = df.memory_usage(deep=True).sum()
df_cat = df.astype('category')
mem_after = df_cat.memory_usage(deep=True).sum()
print(mem_before, mem_after)

ATotal memory usage decreases significantly after conversion

BTotal memory usage increases after conversion

CTotal memory usage stays the same

DTotal memory usage becomes zero

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Why does this categorical conversion not reduce memory?

Consider this code snippet: import pandas as pd col = pd.Series(['unique' + str(i) for i in range(10000)]) col_cat = col.astype('category') print(col.memory_usage(deep=true), col_cat.memory_usage(deep=true)) Why does converting to categorical not reduce memory usage here?

Pandas

import pandas as pd
col = pd.Series(['unique' + str(i) for i in range(10000)])
col_cat = col.astype('category')
print(col.memory_usage(deep=True), col_cat.memory_usage(deep=True))

ABecause pandas does not support categorical for string data

BBecause categorical dtype always uses more memory than object dtype

CBecause all values are unique, categorical stores all strings plus codes, increasing memory

DBecause the code has a syntax error and does not run

Attempts:

2 left

🧠 Conceptual

advanced

1:30remaining

Understanding categorical dtype internals

Which statement best describes how pandas stores data internally when using categorical dtype?

AIt converts all strings to numeric floats

BIt stores all string values repeatedly as in object dtype

CIt compresses strings using zip compression internally

DIt stores an array of integer codes and a separate array of unique categories

Attempts:

2 left

🚀 Application

expert

3:00remaining

Optimizing memory for a large DataFrame with mixed data types

You have a large DataFrame with 1 million rows and these columns: - 'city' (string, 100 unique values) - 'temperature' (float) - 'date' (datetime) - 'status' (string, 3 unique values) Which memory optimization strategy will save the most memory without losing data?

AConvert 'city' and 'status' to categorical dtype, keep 'temperature' as float, and 'date' as datetime

BConvert 'temperature' to categorical dtype, keep others as is

CConvert all columns to categorical dtype

DConvert 'date' to string dtype, keep others as is

Attempts:

2 left