0
0
Data Analysis Pythondata~20 mins

Memory usage analysis in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Memory Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Memory usage of pandas DataFrame columns

What is the output of the following code showing memory usage of each column in a pandas DataFrame?

Data Analysis Python
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'A': np.arange(1000),
    'B': np.random.rand(1000),
    'C': ['text']*1000
})
mem_usage = df.memory_usage(deep=True)
print(mem_usage)
A
Index    128
A       1000
B       8000
C      40000
dtype: int64
B
Index    128
A       8000
B       8000
C      32000
dtype: int64
C
Index    128
A       8000
B       8000
C      61000
dtype: int64
D
Index    128
A       8000
B       1000
C      40000
dtype: int64
Attempts:
2 left
💡 Hint

Remember that integers and floats take 8 bytes each, and strings with repeated values can have shared memory but deep=True counts actual string memory.

data_output
intermediate
2:00remaining
Memory usage reduction by changing data types

Given a DataFrame with 1 million integers stored as int64, what is the memory usage after converting the column to int8?

Data Analysis Python
import pandas as pd
import numpy as np
df = pd.DataFrame({'numbers': np.arange(1000000)})
mem_before = df.memory_usage(deep=True).numbers

df['numbers'] = df['numbers'].astype('int8')
mem_after = df.memory_usage(deep=True).numbers
print(mem_before, mem_after)
A8000000 8000000
B8000000 1000000
C1000000 8000000
D1000000 1000000
Attempts:
2 left
💡 Hint

int64 uses 8 bytes per value, int8 uses 1 byte per value.

🔧 Debug
advanced
2:00remaining
Identify the cause of high memory usage in a DataFrame

Why does this DataFrame use more memory than expected?

import pandas as pd
df = pd.DataFrame({
    'col': ['apple', 'banana', 'apple', 'banana'] * 250000
})
print(df.memory_usage(deep=True))
ABecause the DataFrame uses int64 for strings which is inefficient.
BBecause the DataFrame index is very large and uses most memory.
CBecause the DataFrame has many missing values increasing memory.
DBecause the strings are stored as object dtype without using categorical type, so each string is stored separately.
Attempts:
2 left
💡 Hint

Check how pandas stores repeated strings in object dtype.

🚀 Application
advanced
2:00remaining
Visualizing memory usage of DataFrame columns

Which code snippet correctly creates a bar plot showing memory usage of each column in a DataFrame df?

A
import matplotlib.pyplot as plt
mem = df.memory_usage(deep=True)
mem.plot(kind='bar')
plt.show()
B
import matplotlib.pyplot as plt
mem = df.memory_usage(deep=True)
plt.plot(mem)
plt.show()
C
import matplotlib.pyplot as plt
mem = df.memory_usage()
plt.bar(mem.index, mem.values)
plt.show()
D
import matplotlib.pyplot as plt
mem = df.memory_usage(deep=True)
plt.scatter(mem.index, mem.values)
plt.show()
Attempts:
2 left
💡 Hint

Use pandas plotting with kind='bar' for bar charts.

🧠 Conceptual
expert
2:00remaining
Understanding memory usage impact of categorical data

Which statement best explains why converting a string column with many repeated values to categorical dtype reduces memory usage?

ACategorical dtype stores the unique strings once and replaces column values with integer codes, reducing memory.
BCategorical dtype compresses strings using zip compression internally, saving memory.
CCategorical dtype converts strings to fixed-length byte arrays, which always use less memory.
DCategorical dtype stores strings as pointers to the original strings, increasing memory usage.
Attempts:
2 left
💡 Hint

Think about how categorical data stores repeated values.