Stack and unstack in Data Analysis Python - Time & Space Complexity
We want to understand how the time needed to stack and unstack data changes as the data size grows.
How does the number of operations grow when we combine or separate data tables?
Analyze the time complexity of the following code snippet.
import pandas as pd
df1 = pd.DataFrame({'A': range(1000), 'B': range(1000, 2000)})
df2 = pd.DataFrame({'A': range(1000, 2000), 'B': range(2000, 3000)})
stacked = pd.concat([df1, df2], keys=['df1', 'df2']) # stacking rows
unstacked = stacked.unstack(level=0) # unstacking the DataFrame
This code stacks two data tables by rows and then unstacks the combined table.
- Primary operation: Combining rows with
concatand reshaping withunstack. - How many times: Each row from both tables is processed once during stacking, and each element is processed once during unstacking.
When the number of rows doubles, the stacking operation processes twice as many rows, and unstacking processes twice as many elements.
| Input Size (n rows per DataFrame) | Approx. Operations |
|---|---|
| 10 | About 20 rows stacked, 20 elements unstacked |
| 100 | About 200 rows stacked, 200 elements unstacked |
| 1000 | About 2000 rows stacked, 2000 elements unstacked |
Pattern observation: Operations grow roughly in direct proportion to the total number of rows combined.
Time Complexity: O(n)
This means the time to stack and unstack grows linearly with the number of rows involved.
[X] Wrong: "Stacking two tables takes constant time no matter how big they are."
[OK] Correct: Each row must be copied or referenced, so more rows mean more work, making time grow with data size.
Understanding how stacking and unstacking scale helps you handle data efficiently and shows you can reason about data operations in real projects.
"What if we stacked three or more DataFrames instead of two? How would the time complexity change?"