Wide to long format conversion in Pandas - Time & Space Complexity
When we change data from wide to long format, we rearrange it to make analysis easier.
We want to know how the time needed grows as the data size grows.
Analyze the time complexity of the following code snippet.
import pandas as pd
df = pd.DataFrame({
'id': [1, 2, 3],
'A_2020': [10, 20, 30],
'A_2021': [15, 25, 35],
'B_2020': [5, 10, 15],
'B_2021': [7, 14, 21]
})
long_df = pd.wide_to_long(df, stubnames=['A', 'B'], i='id', j='year', sep='_')
This code changes a table with many columns for years into a longer table with fewer columns but more rows.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: The function processes each cell in the columns being reshaped.
- How many times: It touches each value in the selected columns once to rearrange them.
As the number of rows or columns grows, the work grows roughly by the total number of values to move.
| Input Size (n rows x m columns) | Approx. Operations |
|---|---|
| 10 x 4 | About 40 |
| 100 x 4 | About 400 |
| 1000 x 4 | About 4000 |
Pattern observation: The operations grow roughly in direct proportion to the number of data points.
Time Complexity: O(n x m)
This means the time needed grows proportionally with the number of rows times the number of columns being reshaped.
[X] Wrong: "The time depends only on the number of rows, not columns."
[OK] Correct: Because each column's data must be processed, more columns mean more work, so both rows and columns affect time.
Understanding how data reshaping scales helps you handle bigger datasets confidently and shows you know how tools work behind the scenes.
"What if we only reshape a subset of columns instead of all? How would the time complexity change?"