Memory usage analysis in Pandas - Time & Space Complexity
We want to understand how the time needed to check memory usage grows as data size grows in pandas.
Specifically, how long does it take to measure memory use when the data gets bigger?
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 1000 # example value for n
data = pd.DataFrame({
'A': [str(i) for i in range(n)],
'B': [str(i) for i in range(n, 2*n)]
})
memory = data.memory_usage(deep=True)
This code creates a DataFrame with n rows and then checks how much memory it uses.
- Primary operation: pandas checks memory for each column and each element inside if deep=True.
- How many times: It goes through all columns and all rows once to sum memory.
As the number of rows (n) grows, the time to check memory grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 checks per column |
| 100 | About 100 checks per column |
| 1000 | About 1000 checks per column |
Pattern observation: The time grows linearly as the data size grows.
Time Complexity: O(n)
This means the time to measure memory grows in a straight line with the number of rows.
[X] Wrong: "Checking memory usage is instant and does not depend on data size."
[OK] Correct: pandas must look at each element to calculate memory, so bigger data takes more time.
Knowing how memory checks scale helps you understand performance when working with big data in pandas.
"What if we set deep=False in memory_usage()? How would the time complexity change?"