0
0
Pandasdata~5 mins

Memory usage analysis in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Memory usage analysis
O(n)
Understanding Time Complexity

We want to understand how the time needed to check memory usage grows as data size grows in pandas.

Specifically, how long does it take to measure memory use when the data gets bigger?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

n = 1000  # example value for n

data = pd.DataFrame({
    'A': [str(i) for i in range(n)],
    'B': [str(i) for i in range(n, 2*n)]
})

memory = data.memory_usage(deep=True)

This code creates a DataFrame with n rows and then checks how much memory it uses.

Identify Repeating Operations
  • Primary operation: pandas checks memory for each column and each element inside if deep=True.
  • How many times: It goes through all columns and all rows once to sum memory.
How Execution Grows With Input

As the number of rows (n) grows, the time to check memory grows roughly in direct proportion.

Input Size (n)Approx. Operations
10About 10 checks per column
100About 100 checks per column
1000About 1000 checks per column

Pattern observation: The time grows linearly as the data size grows.

Final Time Complexity

Time Complexity: O(n)

This means the time to measure memory grows in a straight line with the number of rows.

Common Mistake

[X] Wrong: "Checking memory usage is instant and does not depend on data size."

[OK] Correct: pandas must look at each element to calculate memory, so bigger data takes more time.

Interview Connect

Knowing how memory checks scale helps you understand performance when working with big data in pandas.

Self-Check

"What if we set deep=False in memory_usage()? How would the time complexity change?"