0
0
Pandasdata~5 mins

Resampling with groupby for time data in Pandas

Choose your learning style9 modes available
Introduction

Resampling helps to change the frequency of time data, like turning daily data into monthly data. Using groupby with resampling lets you do this for each group separately.

You have sales data for different stores and want monthly totals per store.
You track sensor readings every minute and want hourly averages for each sensor.
You collect website visits per second and want daily counts per user.
You monitor temperature readings per city and want weekly summaries per city.
Syntax
Pandas
df.groupby('group_column').resample('time_frequency').agg_function()

The 'group_column' is the column to group by (like store or sensor).

The 'time_frequency' is a string like 'M' for month, 'H' for hour, 'D' for day.

Examples
Sum values monthly for each store.
Pandas
df.groupby('store').resample('M').sum()
Calculate hourly average for each sensor.
Pandas
df.groupby('sensor').resample('H').mean()
Count daily records for each user.
Pandas
df.groupby('user').resample('D').count()
Sample Program

This code groups sales data by store, then resamples it by month, summing sales for each store per month.

Pandas
import pandas as pd

# Create sample data
data = {
    'store': ['A', 'A', 'A', 'B', 'B', 'B'],
    'date': pd.to_datetime([
        '2024-01-01', '2024-01-15', '2024-02-01',
        '2024-01-05', '2024-01-20', '2024-02-10']),
    'sales': [10, 20, 30, 5, 15, 25]
}

df = pd.DataFrame(data)

# Set date as index for resampling

df = df.set_index('date')

# Group by store and resample monthly, summing sales
monthly_sales = df.groupby('store').resample('M').sum()

print(monthly_sales)
OutputSuccess
Important Notes

Make sure the time column is set as the DataFrame index before resampling.

Resampling works only on datetime-like indexes.

After groupby, resample applies to each group separately.

Summary

Resampling changes the time frequency of data.

Use groupby with resample to do this for each group.

Set the datetime column as index before resampling.