Dropping columns and rows in Pandas - Time & Space Complexity
When we drop columns or rows in pandas, we want to know how the time it takes changes as the data grows.
We ask: How does the work increase when the table gets bigger?
Analyze the time complexity of the following code snippet.
import pandas as pd
data = pd.DataFrame({
'A': range(1000),
'B': range(1000, 2000),
'C': range(2000, 3000)
})
# Drop one column and one row
result = data.drop(columns=['B']).drop(index=[0])
This code creates a table with 3 columns and 1000 rows, then removes one column and one row.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Copying data except the dropped parts.
- How many times: Once for each drop operation, touching all remaining rows or columns.
When the table grows, dropping a column or row means copying almost all data except the dropped parts.
| Input Size (n rows) | Approx. Operations |
|---|---|
| 10 | About 20 (copying 2 columns x 10 rows) |
| 100 | About 200 (copying 2 columns x 100 rows) |
| 1000 | About 2000 (copying 2 columns x 1000 rows) |
Pattern observation: The work grows roughly in direct proportion to the number of rows times columns kept.
Time Complexity: O(n x m)
This means the time to drop columns or rows grows roughly with the size of the data kept, where n is rows and m is columns.
[X] Wrong: "Dropping a single column or row is always very fast and constant time."
[OK] Correct: Even dropping one column or row requires copying the rest of the data, so time grows with the size of what remains.
Understanding how data operations scale helps you explain your choices clearly and shows you know what happens behind the scenes.
What if we dropped multiple columns at once? How would the time complexity change?