0
0
Pandasdata~10 mins

Resampling time series data in Pandas - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Resampling time series data
Start with time series data
Choose resampling frequency
Apply resample() method
Aggregate data (mean, sum, etc.)
Get new resampled time series
End
We start with time series data, pick a new time interval, resample using pandas, aggregate values, and get a new summarized time series.
Execution Sample
Pandas
import pandas as pd

# Create sample data
idx = pd.date_range('2024-01-01', periods=6, freq='H')
data = pd.Series([10, 20, 15, 30, 25, 40], index=idx)

# Resample hourly data to 3-hour intervals
resampled = data.resample('3H').mean()
print(resampled)
This code creates hourly data and resamples it to 3-hour intervals by averaging values.
Execution Table
StepActionInput DataResample FrequencyAggregationOutput Data
1Create hourly time series[10,20,15,30,25,40]Hourly (H)None2024-01-01 00:00:00 -> 10 2024-01-01 01:00:00 -> 20 2024-01-01 02:00:00 -> 15 2024-01-01 03:00:00 -> 30 2024-01-01 04:00:00 -> 25 2024-01-01 05:00:00 -> 40
2Choose resample frequencyHourly data3 Hours (3H)NonePreparing to group data into 3-hour bins
3Group data into 3-hour binsHourly data3HNoneBin 1: 00:00-02:59 -> [10,20,15] Bin 2: 03:00-05:59 -> [30,25,40]
4Aggregate each bin by meanBins3HMeanBin 1 mean: (10+20+15)/3 = 15 Bin 2 mean: (30+25+40)/3 = 31.67
5Create new resampled seriesAggregated means3HMean2024-01-01 00:00:00 -> 15 2024-01-01 03:00:00 -> 31.67
6Print resampled dataResampled series3HMeanOutput: 2024-01-01 00:00:00 15.00 2024-01-01 03:00:00 31.67 Freq: 3H, dtype: float64
💡 All original data grouped and aggregated into 3-hour intervals, resampling complete.
Variable Tracker
VariableStartAfter Step 1After Step 3After Step 4Final
dataNone[10,20,15,30,25,40] hourly indexed[[10,20,15],[30,25,40]] grouped by 3H[15, 31.67] means of groups[15, 31.67] resampled series
resampledNoneNoneNoneNone[15.0, 31.67] with 3H freq
Key Moments - 3 Insights
Why does the resampled series have fewer rows than the original?
Because resampling groups multiple original time points into larger intervals (3 hours here), so fewer aggregated points appear as shown in execution_table rows 3 and 5.
What happens if we use sum instead of mean for aggregation?
The grouped values would be added instead of averaged, changing the output values in step 4 and 5 accordingly.
Why is the new index at 00:00 and 03:00 after resampling?
Because pandas labels each resampled bin by the start time of the interval, as shown in execution_table row 5.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 4, what is the mean of the first 3-hour bin?
A31.67
B15
C20
D10
💡 Hint
Check the 'Output Data' column at step 4 for the first bin's mean.
At which step does the data get grouped into 3-hour bins?
AStep 4
BStep 2
CStep 3
DStep 5
💡 Hint
Look for the step describing grouping in the 'Action' column.
If we changed the resample frequency to '2H', how would the number of output rows change?
AMore rows than with '3H'
BFewer rows than with '3H'
CSame number of rows
DNo rows
💡 Hint
Refer to variable_tracker 'data' row showing grouping by frequency.
Concept Snapshot
Resampling time series data with pandas:
- Use .resample('freq') on a time-indexed series or DataFrame
- 'freq' is new time interval (e.g., '3H' for 3 hours)
- Aggregate grouped data with mean(), sum(), etc.
- Result is a new time series with fewer or more points
- Index labels are interval start times
Full Transcript
This visual execution shows how pandas resamples time series data. We start with hourly data points. We pick a new frequency, here 3 hours. The data is grouped into 3-hour bins. Each bin's values are aggregated by mean. The output is a new series with fewer points, each representing the average over 3 hours. The index labels are the start times of each 3-hour interval. This process helps summarize or change the time scale of data easily.