What is the output of the following code showing memory usage of each column in bytes?
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': np.arange(1000, dtype='int64'), 'B': np.random.rand(1000), 'C': ['text']*1000 }) mem_usage = df.memory_usage(deep=True) print(mem_usage)
Remember that strings with deep=True count the actual string memory, not just pointers.
The 'C' column contains 1000 identical strings 'text'. Each string takes about 24 bytes including overhead, so total is about 24000 bytes. Numeric columns use fixed sizes.
What is the total memory usage in bytes of the DataFrame below, including index and deep memory?
import pandas as pd import numpy as np df = pd.DataFrame({ 'X': np.random.randint(0, 100, 5000), 'Y': ['a']*5000 }) total_mem = df.memory_usage(deep=True).sum() print(total_mem)
Consider index memory plus each column's memory with deep=True.
Index of 5000 int64 uses 40000 bytes, 'X' column 40000 bytes, 'Y' column 20128 bytes (strings). Total is 100128 bytes.
Given the code below, why does the memory usage of df not decrease after dropping column 'B'?
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': np.arange(10000), 'B': ['text']*10000 }) print(df.memory_usage(deep=True).sum()) df.drop('B', axis=1, inplace=True) print(df.memory_usage(deep=True).sum())
Think about how pandas manages memory internally and when garbage collection happens.
pandas may keep memory allocated internally even after dropping columns due to caching and delayed garbage collection. Memory may not reduce immediately.
Which statement best describes the effect of converting a string column to category dtype on memory usage?
Think about how categories store repeated values efficiently.
Converting to category stores unique strings once and replaces column with integer codes, reducing memory usage significantly.
You have a DataFrame with 1 million rows and columns of various types. Which approach will most effectively reduce memory usage without losing data?
Consider safe downcasting and categorical conversion for repeated strings.
Downcasting numeric columns to smallest subtype that fits data and converting low-cardinality object columns to category reduces memory without data loss. Blind conversion or dropping columns can cause data loss or inefficiency.