What is the output of the following code showing memory usage of each column in a pandas DataFrame?
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': np.arange(1000), 'B': np.random.rand(1000), 'C': ['text']*1000 }) mem_usage = df.memory_usage(deep=True) print(mem_usage)
Remember that integers and floats take 8 bytes each, and strings with repeated values can have shared memory but deep=True counts actual string memory.
The integer column 'A' has 1000 values * 8 bytes = 8000 bytes. The float column 'B' also has 8000 bytes. The string column 'C' has 1000 identical strings, each taking about 61 bytes including overhead, totaling 61000 bytes. The index memory is 128 bytes.
Given a DataFrame with 1 million integers stored as int64, what is the memory usage after converting the column to int8?
import pandas as pd import numpy as np df = pd.DataFrame({'numbers': np.arange(1000000)}) mem_before = df.memory_usage(deep=True).numbers df['numbers'] = df['numbers'].astype('int8') mem_after = df.memory_usage(deep=True).numbers print(mem_before, mem_after)
int64 uses 8 bytes per value, int8 uses 1 byte per value.
Original memory is 1,000,000 values * 8 bytes = 8,000,000 bytes. After converting to int8, memory is 1,000,000 * 1 byte = 1,000,000 bytes.
Why does this DataFrame use more memory than expected?
import pandas as pd
df = pd.DataFrame({
'col': ['apple', 'banana', 'apple', 'banana'] * 250000
})
print(df.memory_usage(deep=True))Check how pandas stores repeated strings in object dtype.
Object dtype stores each string separately, even if repeated. Using categorical dtype would reduce memory by storing unique strings once.
Which code snippet correctly creates a bar plot showing memory usage of each column in a DataFrame df?
Use pandas plotting with kind='bar' for bar charts.
Option A uses pandas Series plot with kind='bar' which correctly shows a bar chart of memory usage per column. Option A misses deep=True and uses plt.bar directly but may not label axes well. Option A uses line plot which is less clear. Option A uses scatter plot which is not suitable here.
Which statement best explains why converting a string column with many repeated values to categorical dtype reduces memory usage?
Think about how categorical data stores repeated values.
Categorical dtype stores unique values once and replaces each value with a small integer code, which uses less memory than storing full strings repeatedly.