Pandasdata~5 mins

Wide to long format conversion in Pandas - Time & Space Complexity

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Time Complexity: Wide to long format conversion

O(n x m)

Understanding Time Complexity

When we change data from wide to long format, we rearrange it to make analysis easier.

We want to know how the time needed grows as the data size grows.

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

df = pd.DataFrame({
    'id': [1, 2, 3],
    'A_2020': [10, 20, 30],
    'A_2021': [15, 25, 35],
    'B_2020': [5, 10, 15],
    'B_2021': [7, 14, 21]
})

long_df = pd.wide_to_long(df, stubnames=['A', 'B'], i='id', j='year', sep='_')

This code changes a table with many columns for years into a longer table with fewer columns but more rows.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

Primary operation: The function processes each cell in the columns being reshaped.
How many times: It touches each value in the selected columns once to rearrange them.

How Execution Grows With Input

As the number of rows or columns grows, the work grows roughly by the total number of values to move.

Input Size (n rows x m columns)	Approx. Operations
10 x 4	About 40
100 x 4	About 400
1000 x 4	About 4000

Pattern observation: The operations grow roughly in direct proportion to the number of data points.

Final Time Complexity

Time Complexity: O(n x m)

This means the time needed grows proportionally with the number of rows times the number of columns being reshaped.

Common Mistake

[X] Wrong: "The time depends only on the number of rows, not columns."

[OK] Correct: Because each column's data must be processed, more columns mean more work, so both rows and columns affect time.

Interview Connect

Understanding how data reshaping scales helps you handle bigger datasets confidently and shows you know how tools work behind the scenes.

Self-Check

"What if we only reshape a subset of columns instead of all? How would the time complexity change?"