ewm() for exponential moving average in Pandas - Time & Space Complexity
We want to understand how the time needed to calculate an exponential moving average changes as the data grows.
How does the work increase when we have more data points?
Analyze the time complexity of the following code snippet.
import pandas as pd
# Create a DataFrame with n rows
n = 1000
data = pd.DataFrame({'value': range(n)})
# Calculate exponential moving average with span=10
ema = data['value'].ewm(span=10).mean()
This code creates a list of numbers and calculates the exponential moving average over them.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: The calculation of the exponential moving average updates each value once in order.
- How many times: It processes each of the n data points exactly one time.
As the number of data points grows, the work grows in a straight line.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 updates |
| 100 | About 100 updates |
| 1000 | About 1000 updates |
Pattern observation: Doubling the data roughly doubles the work needed.
Time Complexity: O(n)
This means the time to calculate the exponential moving average grows linearly with the number of data points.
[X] Wrong: "Calculating the exponential moving average requires checking all previous data points for each new point, so it is slow like O(n²)."
[OK] Correct: The exponential moving average uses a formula that updates from the previous result, so it only needs one step per data point, not all previous points.
Understanding how calculations like exponential moving averages scale helps you explain efficient data processing in real projects.
"What if we calculated a simple moving average with a fixed window instead? How would the time complexity change?"