Melt for wide-to-long reshaping in Data Analysis Python - Time & Space Complexity
When we reshape data from wide to long format using melt, it is important to know how the time needed changes as the data grows.
We want to understand how the running time changes when the number of rows or columns increases.
Analyze the time complexity of the following code snippet.
import pandas as pd
df = pd.DataFrame({
'id': [1, 2, 3],
'A': [10, 20, 30],
'B': [40, 50, 60],
'C': [70, 80, 90]
})
melted = pd.melt(df, id_vars=['id'], value_vars=['A', 'B', 'C'], var_name='variable', value_name='value')
This code takes a table with 3 columns of values and reshapes it into a longer format with one value column.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Iterating over each row and each selected column to create new rows.
- How many times: For each of the n rows and m columns to melt, the operation runs once, so n x m times.
As the number of rows or columns to melt increases, the total work grows by multiplying these two.
| Input Size (n rows x m columns) | Approx. Operations |
|---|---|
| 10 x 3 | 30 |
| 100 x 3 | 300 |
| 1000 x 3 | 3000 |
Pattern observation: Doubling rows doubles work; doubling columns doubles work; work grows proportionally to rows times columns.
Time Complexity: O(n x m)
This means the time needed grows proportionally to the number of rows times the number of columns being melted.
[X] Wrong: "Melt runs in constant time regardless of data size because it just reshapes."
[OK] Correct: Melt must look at every value to rearrange it, so more data means more work and longer time.
Understanding how reshaping data scales helps you explain your choices clearly and shows you know how data size affects performance.
"What if we melt only a subset of columns instead of all? How would the time complexity change?"