Given a pandas DataFrame with integer columns, what is the output of the memory usage after converting the columns to int8?
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': np.random.randint(0, 100, 1000), 'B': np.random.randint(0, 100, 1000) }) before = df.memory_usage(deep=True).sum() df['A'] = df['A'].astype('int8') df['B'] = df['B'].astype('int8') after = df.memory_usage(deep=True).sum() print(after < before)
Think about how changing data types to smaller ones affects memory.
Converting integer columns from default int64 to int8 reduces memory usage, so after < before is True.
What are the data types of the columns after applying the following optimization?
import pandas as pd df = pd.DataFrame({ 'col1': [1, 2, 3], 'col2': [1000, 2000, 3000], 'col3': ['a', 'b', 'c'] }) df['col1'] = df['col1'].astype('int8') df['col2'] = df['col2'].astype('int16') result = df.dtypes.to_dict()
Check the explicit astype conversions.
The code converts col1 to int8 and col2 to int16. col3 remains object.
What error will this code raise when trying to convert the column to int8?
import pandas as pd df = pd.DataFrame({'values': [0, 127, 128, 255]}) df['values'] = df['values'].astype('int8')
Check the range of int8 and the values in the column.
Values like 128 and 255 exceed the range of int8 (-128 to 127), causing a ValueError.
You have a DataFrame with columns: age (0-120), income (0-1,000,000), and gender ('M' or 'F'). Which data types optimize memory without losing information?
Consider the value ranges and best types for categorical data.
age fits in int8 (0-127 covers 0-120), income fits in int32 (max ~2 billion), and gender is best as category for two values.
Which statement best describes the impact of data type optimization on data processing performance?
Think about trade-offs between memory and CPU operations.
While smaller data types reduce memory, some operations may require upcasting or conversions, which can slow down processing.