Challenge - 5 Problems
Dtype Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of memory usage with different dtypes
What is the output of the following code snippet that compares memory usage of integer columns with default and optimized dtypes?
Pandas
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': np.random.randint(0, 100, size=1000), 'B': np.random.randint(0, 100, size=1000) }) mem_default = df.memory_usage(deep=True, index=False).sum() df['A'] = df['A'].astype('int8') df['B'] = df['B'].astype('int8') mem_optimized = df.memory_usage(deep=True, index=False).sum() print(mem_default, mem_optimized)
Attempts:
2 left
💡 Hint
Think about how changing from default int64 to int8 affects memory size.
✗ Incorrect
The default integer dtype in pandas is int64, which uses 8 bytes per value. Changing to int8 reduces each value to 1 byte. For 1000 rows and 2 columns, default memory is 8*1000*2=16000 bytes. After conversion, 1*1000*2=2000 bytes. So output is '16000 2000'.
❓ data_output
intermediate2:00remaining
Resulting dtypes after conversion
Given the DataFrame below, what are the dtypes of columns after applying the conversion code?
Pandas
import pandas as pd df = pd.DataFrame({ 'col1': [1, 2, 3], 'col2': [0.1, 0.2, 0.3], 'col3': ['a', 'b', 'c'] }) df['col1'] = df['col1'].astype('int16') df['col2'] = df['col2'].astype('float32')
Attempts:
2 left
💡 Hint
Check the astype conversions applied to col1 and col2.
✗ Incorrect
The code explicitly converts col1 to int16 and col2 to float32. col3 remains unchanged as object dtype because it contains strings.
🔧 Debug
advanced2:00remaining
Identify the error in dtype conversion
What error will this code raise when trying to convert the 'age' column to 'int8'?
Pandas
import pandas as pd df = pd.DataFrame({'age': [25, 300, 45]}) df['age'] = df['age'].astype('int8')
Attempts:
2 left
💡 Hint
Check if values fit in int8 range (-128 to 127).
✗ Incorrect
Although 300 is outside int8 range, pandas silently wraps around values when converting to int8, so no error is raised. The value 300 becomes -56 due to overflow.
🧠 Conceptual
advanced2:00remaining
Best dtype for categorical data
Which dtype is most memory efficient and appropriate for a column with repeated string categories like 'red', 'blue', 'green'?
Attempts:
2 left
💡 Hint
Think about how pandas stores repeated categories internally.
✗ Incorrect
The 'category' dtype stores repeated string values as integer codes internally, saving memory and improving performance for repeated categories.
🚀 Application
expert3:00remaining
Optimize memory usage for mixed dtype DataFrame
Given a DataFrame with columns: 'id' (integers 0-100000), 'score' (floats 0-1), 'grade' (strings 'A', 'B', 'C'), which dtype conversions will minimize memory usage without losing data?
Attempts:
2 left
💡 Hint
Consider the range of 'id' and precision needed for 'score'.
✗ Incorrect
'id' values up to 100000 fit in int32 but not int16 or int8. 'score' between 0 and 1 can use float32 safely. 'grade' with few repeated strings is best as category dtype.