stack() and unstack() in Pandas - Time & Space Complexity
We want to understand how the time needed to reshape data with stack() and unstack() grows as the data size increases.
How does the work change when the data has more rows or columns?
Analyze the time complexity of the following code snippet.
import pandas as pd
# Create a DataFrame with 1000 rows and 5 columns
df = pd.DataFrame({f'col{i}': range(1000) for i in range(5)})
# Use stack to reshape columns into rows
stacked = df.stack()
# Use unstack to reshape back
unstacked = stacked.unstack()
This code reshapes a DataFrame by stacking columns into a longer format and then unstacks it back to the original shape.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Traversing all elements in the DataFrame to rearrange them.
- How many times: Each element is visited once during
stack()and once duringunstack().
As the number of rows and columns grows, the total elements grow by multiplying rows and columns.
| Input Size (rows x columns) | Approx. Operations |
|---|---|
| 10 x 5 = 50 | About 50 operations per stack or unstack |
| 100 x 5 = 500 | About 500 operations per stack or unstack |
| 1000 x 5 = 5000 | About 5000 operations per stack or unstack |
Pattern observation: The work grows roughly in direct proportion to the total number of elements.
Time Complexity: O(n x m)
This means the time needed grows linearly with the total number of elements, where n is rows and m is columns.
[X] Wrong: "Stacking or unstacking only touches a few elements, so it's very fast regardless of size."
[OK] Correct: Actually, these operations must visit every element to rearrange them, so the time grows with the total data size.
Knowing how reshaping data scales helps you explain your choices clearly and shows you understand the cost of data transformations.
"What if we stacked a DataFrame with a MultiIndex on rows? How would the time complexity change?"