Challenge - 5 Problems

🎖️

Dtype Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of memory usage with different dtypes

What is the output of the following code snippet that compares memory usage of integer columns with default and optimized dtypes?

Pandas

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': np.random.randint(0, 100, size=1000),
    'B': np.random.randint(0, 100, size=1000)
})

mem_default = df.memory_usage(deep=True, index=False).sum()
df['A'] = df['A'].astype('int8')
df['B'] = df['B'].astype('int8')
mem_optimized = df.memory_usage(deep=True, index=False).sum()
print(mem_default, mem_optimized)

A16000 2000

B16000 4000

C8000 4000

D8000 2000

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Resulting dtypes after conversion

Given the DataFrame below, what are the dtypes of columns after applying the conversion code?

Pandas

import pandas as pd

df = pd.DataFrame({
    'col1': [1, 2, 3],
    'col2': [0.1, 0.2, 0.3],
    'col3': ['a', 'b', 'c']
})

df['col1'] = df['col1'].astype('int16')
df['col2'] = df['col2'].astype('float32')

Acol1: int16, col2: float32, col3: object

Bcol1: int64, col2: float64, col3: object

Ccol1: int16, col2: float64, col3: string

Dcol1: int32, col2: float32, col3: object

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in dtype conversion

What error will this code raise when trying to convert the 'age' column to 'int8'?

Pandas

import pandas as pd

df = pd.DataFrame({'age': [25, 300, 45]})
df['age'] = df['age'].astype('int8')

ATypeError

BValueError

COverflowError

DNo error, conversion succeeds

Attempts:

2 left

🧠 Conceptual

advanced

2:00remaining

Best dtype for categorical data

Which dtype is most memory efficient and appropriate for a column with repeated string categories like 'red', 'blue', 'green'?

Aobject

Bcategory

Cstring

Dint64

Attempts:

2 left

🚀 Application

expert

3:00remaining

Optimize memory usage for mixed dtype DataFrame

Given a DataFrame with columns: 'id' (integers 0-100000), 'score' (floats 0-1), 'grade' (strings 'A', 'B', 'C'), which dtype conversions will minimize memory usage without losing data?

Aid: int8, score: float32, grade: category

Bid: int16, score: float64, grade: object

Cid: int32, score: float32, grade: category

Did: int64, score: float16, grade: string

Attempts:

2 left