Why transformation reshapes data for analysis in Data Analysis Python - Performance Analysis
When we reshape data, we change its structure to make analysis easier.
We want to know how the time to reshape grows as the data gets bigger.
Analyze the time complexity of the following code snippet.
import pandas as pd
data = pd.DataFrame({
'id': [1, 2, 1, 2],
'variable': ['A', 'A', 'B', 'B'],
'value': [10, 20, 30, 40]
})
reshaped = data.pivot(index='id', columns='variable', values='value')
This code reshapes a table from long format to wide format using pivot.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning each row to place values in the new shape.
- How many times: Once for each row in the original data.
As the number of rows grows, the time to reshape grows roughly the same way.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 operations |
| 100 | About 100 operations |
| 1000 | About 1000 operations |
Pattern observation: The work grows directly with the number of rows.
Time Complexity: O(n)
This means the time to reshape grows in a straight line as data size grows.
[X] Wrong: "Reshaping data takes the same time no matter how big the data is."
[OK] Correct: The process must look at each row, so more rows mean more work and more time.
Understanding how reshaping scales helps you explain data preparation steps clearly and confidently.
"What if we used a groupby aggregation instead of pivot? How would the time complexity change?"