Inplace operations consideration in Pandas - Time & Space Complexity
We want to understand how inplace operations affect the time it takes to run pandas code.
Specifically, does doing changes inplace save time as data size grows?
Analyze the time complexity of this pandas code snippet.
import pandas as pd
n = 1000
df = pd.DataFrame({'A': range(n), 'B': range(n)})
df.drop('B', axis=1, inplace=True)
This code creates a DataFrame and drops one column using inplace=True.
Look for loops or repeated work inside the operation.
- Primary operation: Removing a column from the DataFrame.
- How many times: The operation touches all rows once to adjust the data structure.
As the number of rows grows, the work to drop a column grows too.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 operations |
| 100 | About 100 operations |
| 1000 | About 1000 operations |
Pattern observation: The work grows roughly in direct proportion to the number of rows.
Time Complexity: O(n)
This means the time to drop a column grows linearly with the number of rows.
[X] Wrong: "Using inplace=True makes the operation run faster because it avoids copying data."
[OK] Correct: Inplace operations still need to touch all data to update structures, so time grows the same way as without inplace.
Understanding how inplace affects time helps you explain performance clearly and choose the right method in real projects.
"What if we dropped multiple columns at once instead of one? How would the time complexity change?"