0
0
Data Analysis Pythondata~5 mins

Stack and unstack in Data Analysis Python - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Stack and unstack
O(n)
Understanding Time Complexity

We want to understand how the time needed to stack and unstack data changes as the data size grows.

How does the number of operations grow when we combine or separate data tables?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

df1 = pd.DataFrame({'A': range(1000), 'B': range(1000, 2000)})
df2 = pd.DataFrame({'A': range(1000, 2000), 'B': range(2000, 3000)})

stacked = pd.concat([df1, df2], keys=['df1', 'df2'])  # stacking rows
unstacked = stacked.unstack(level=0)   # unstacking the DataFrame

This code stacks two data tables by rows and then unstacks the combined table.

Identify Repeating Operations
  • Primary operation: Combining rows with concat and reshaping with unstack.
  • How many times: Each row from both tables is processed once during stacking, and each element is processed once during unstacking.
How Execution Grows With Input

When the number of rows doubles, the stacking operation processes twice as many rows, and unstacking processes twice as many elements.

Input Size (n rows per DataFrame)Approx. Operations
10About 20 rows stacked, 20 elements unstacked
100About 200 rows stacked, 200 elements unstacked
1000About 2000 rows stacked, 2000 elements unstacked

Pattern observation: Operations grow roughly in direct proportion to the total number of rows combined.

Final Time Complexity

Time Complexity: O(n)

This means the time to stack and unstack grows linearly with the number of rows involved.

Common Mistake

[X] Wrong: "Stacking two tables takes constant time no matter how big they are."

[OK] Correct: Each row must be copied or referenced, so more rows mean more work, making time grow with data size.

Interview Connect

Understanding how stacking and unstacking scale helps you handle data efficiently and shows you can reason about data operations in real projects.

Self-Check

"What if we stacked three or more DataFrames instead of two? How would the time complexity change?"