Shift and lag operations in Data Analysis Python - Time & Space Complexity
We want to understand how the time it takes to perform shift and lag operations changes as the data size grows.
Specifically, we ask: how does the work increase when we shift or lag a column in a dataset?
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 10 # example size
data = pd.DataFrame({'values': range(n)})
data['lagged'] = data['values'].shift(1)
This code creates a column with values shifted down by one row, introducing a lag.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: The shift method moves each value down by one position in the column.
- How many times: It processes each of the n rows once to create the lagged column.
As the number of rows n increases, the operation must move each value once.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 moves |
| 100 | 100 moves |
| 1000 | 1000 moves |
Pattern observation: The work grows directly with the number of rows, so doubling rows doubles the work.
Time Complexity: O(n)
This means the time to shift or lag grows linearly with the number of rows in the data.
[X] Wrong: "Shift or lag operations are constant time because they just move data by one position."
[OK] Correct: Even though the shift is by one position, the operation must touch every row to create the new column, so time grows with data size.
Understanding how simple data transformations scale helps you explain your code's efficiency clearly and confidently in real projects or interviews.
"What if we shifted by k positions instead of 1? How would the time complexity change?"