0
0
Pandasdata~5 mins

stack() and unstack() in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: stack() and unstack()
O(n x m)
Understanding Time Complexity

We want to understand how the time needed to reshape data with stack() and unstack() grows as the data size increases.

How does the work change when the data has more rows or columns?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

# Create a DataFrame with 1000 rows and 5 columns
df = pd.DataFrame({f'col{i}': range(1000) for i in range(5)})

# Use stack to reshape columns into rows
stacked = df.stack()

# Use unstack to reshape back
unstacked = stacked.unstack()

This code reshapes a DataFrame by stacking columns into a longer format and then unstacks it back to the original shape.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Traversing all elements in the DataFrame to rearrange them.
  • How many times: Each element is visited once during stack() and once during unstack().
How Execution Grows With Input

As the number of rows and columns grows, the total elements grow by multiplying rows and columns.

Input Size (rows x columns)Approx. Operations
10 x 5 = 50About 50 operations per stack or unstack
100 x 5 = 500About 500 operations per stack or unstack
1000 x 5 = 5000About 5000 operations per stack or unstack

Pattern observation: The work grows roughly in direct proportion to the total number of elements.

Final Time Complexity

Time Complexity: O(n x m)

This means the time needed grows linearly with the total number of elements, where n is rows and m is columns.

Common Mistake

[X] Wrong: "Stacking or unstacking only touches a few elements, so it's very fast regardless of size."

[OK] Correct: Actually, these operations must visit every element to rearrange them, so the time grows with the total data size.

Interview Connect

Knowing how reshaping data scales helps you explain your choices clearly and shows you understand the cost of data transformations.

Self-Check

"What if we stacked a DataFrame with a MultiIndex on rows? How would the time complexity change?"