Resampling time series data in Pandas - Time & Space Complexity
When working with time series data, resampling helps us change the frequency of data points.
We want to know how the time to resample grows as the data size increases.
Analyze the time complexity of the following code snippet.
import pandas as pd
dates = pd.date_range('2023-01-01', periods=1000, freq='T')
data = pd.Series(range(1000), index=dates)
# Resample data to 10-minute frequency and take the mean
resampled = data.resample('10T').mean()
This code changes data from 1-minute intervals to 10-minute intervals by averaging values.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Grouping data points into new time bins and computing the mean for each group.
- How many times: Once for each new time bin created by resampling.
As the number of data points grows, the number of groups also grows proportionally.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 1 group operation |
| 100 | About 10 group operations |
| 1000 | About 100 group operations |
Pattern observation: The work grows roughly in proportion to the number of groups.
Time Complexity: O(n)
This means the time to resample grows linearly with the number of data points.
[X] Wrong: "Resampling time is constant no matter how much data there is."
[OK] Correct: More data means more groups and calculations, so time grows with data size.
Understanding how resampling scales helps you work efficiently with time series data in real projects.
"What if we changed the aggregation from mean to a custom function that is slower? How would the time complexity change?"