Expanding window operations in Pandas - Time & Space Complexity
We want to understand how the time needed to perform expanding window operations changes as the data grows.
How does the work increase when we have more rows to process?
Analyze the time complexity of the following code snippet.
data = list(range(1, n+1))
result = [sum(data[:i+1]) for i in range(n)]
This code calculates the cumulative sum for each expanding window from the start to each row.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: For each row, sum all values from the first row up to that row.
- How many times: This summing happens once for each row, increasing the amount summed each time.
As the number of rows grows, the amount of work grows more than just a little.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 55 sums |
| 100 | About 5,050 sums |
| 1000 | About 500,500 sums |
Pattern observation: The total work grows roughly like the square of the input size.
Time Complexity: O(n²)
This means if you double the data size, the work roughly quadruples.
[X] Wrong: "Expanding sums take the same time as just adding one number per row."
[OK] Correct: Each expanding sum adds more numbers as the window grows, so the work adds up much faster than just one addition per row.
Understanding how expanding operations scale helps you explain performance in data tasks and shows you can think about efficiency clearly.
What if we changed from expanding sums to cumulative sums using a running total? How would the time complexity change?