Rolling standard deviation in Pandas - Time & Space Complexity
We want to understand how the time needed to calculate a rolling standard deviation changes as the data size grows.
Specifically, how does the work increase when we apply a rolling window calculation on a large dataset?
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 1000 # example size
w = 10 # example window size
# Create a sample DataFrame with 1 column and n rows
df = pd.DataFrame({'values': range(n)})
# Calculate rolling standard deviation with window size w
rolling_std = df['values'].rolling(window=w).std()
This code calculates the rolling standard deviation over a window of size w for a column of n values.
- Primary operation: For each of the
nrows, the code computes the standard deviation over the lastwvalues. - How many times: This calculation repeats
n - w + 1times, once per row after the firstw-1rows.
As the number of rows n grows, the number of calculations grows roughly in proportion to n. Each calculation looks at w values.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 x w operations |
| 100 | About 100 x w operations |
| 1000 | About 1000 x w operations |
Pattern observation: The total work grows linearly with n, multiplied by the window size w.
Time Complexity: O(n x w)
This means the time needed grows proportionally to the number of rows times the window size.
[X] Wrong: "The rolling standard deviation calculation takes constant time regardless of window size because it just slides over the data."
[OK] Correct: Each step computes the standard deviation over w values, so larger windows mean more work per step, increasing total time.
Understanding how rolling calculations scale helps you explain performance trade-offs clearly and shows you can reason about data processing costs in real projects.
What if we used a rolling mean instead of standard deviation? How would the time complexity change?