Concept Flow - Resampling with groupby for time data

Start with time-indexed DataFrame

↓

Group data by a category column

↓

Apply resampling on each group by time frequency

↓

Aggregate each resampled group (mean, sum, etc.)

↓

Combine results into one DataFrame

↓

Output resampled grouped data

We first group the data by a category, then resample each group by time intervals, aggregate, and combine results.

Execution Sample

Pandas

import pandas as pd

df = pd.DataFrame({
    'date': pd.date_range('2023-01-01', periods=6, freq='D'),
    'category': ['A', 'A', 'B', 'B', 'A', 'B'],
    'value': [10, 15, 10, 20, 25, 30]
})
df.set_index('date', inplace=True)

result = df.groupby('category').resample('2D').mean().reset_index()

This code groups data by 'category', resamples every 2 days, and calculates the mean value for each group and period.

Execution Table

Step	Action	Group	Resample Period	Values Considered	Aggregation	Output Row
1	Group data by 'category'	A	-	[10 (2023-01-01), 15 (2023-01-02), 25 (2023-01-05)]	-	-
2	Group data by 'category'	B	-	[10 (2023-01-03), 20 (2023-01-04), 30 (2023-01-06)]	-	-
3	Resample group A every 2 days starting 2023-01-01	A	2D (2023-01-01 to 2023-01-02)	[10, 15]	mean	12.5 (2023-01-01)
4	Resample group A next 2-day period	A	2D (2023-01-03 to 2023-01-04)	[]	mean	NaN (2023-01-03)
5	Resample group A next 2-day period	A	2D (2023-01-05 to 2023-01-06)	[25]	mean	25.0 (2023-01-05)
6	Resample group B every 2 days starting 2023-01-03	B	2D (2023-01-03 to 2023-01-04)	[10, 20]	mean	15.0 (2023-01-03)
7	Resample group B next 2-day period	B	2D (2023-01-05 to 2023-01-06)	[30]	mean	30.0 (2023-01-05)
8	Combine all resampled groups into one DataFrame	-	-	-	-	5 rows with category, date, mean value
9	Reset index to flatten DataFrame	-	-	-	-	Final DataFrame with columns: category, date, value
10	End	-	-	-	-	Resampling complete

💡 All groups resampled and aggregated; combined result returned.

Variable Tracker

Variable	Start	After Grouping	After Resample Step 1	After Resample Step 2	After Resample Step 3	Final
df	Original DataFrame with 6 rows	Grouped by 'category' into A and B	Group A resampled first 2-day period mean calculated	Group A resampled second 2-day period mean calculated	Group A resampled third 2-day period mean calculated	Combined resampled DataFrame with 5 rows
result	Not defined	Not defined	Partial resampled means for group A and B	More resampled means added	All resampled means combined	Final DataFrame with category, date, mean value columns

Key Moments - 3 Insights

Why do some resample periods show NaN values in the output?

Why do we need to reset the index after groupby and resample?

How does resample know the start date for each group?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table row 3. What is the mean value for group A in the first 2-day period?

A12.5

B10

C15

D25

Concept Snapshot

Resampling with groupby for time data:
- Group data by a category column
- Resample each group by a time frequency (e.g., '2D')
- Aggregate values in each resample period (mean, sum, etc.)
- Reset index to flatten multi-index
- Useful for time series analysis by groups

Full Transcript

This visual execution traces how to resample time series data grouped by categories using pandas. We start with a DataFrame indexed by date and grouped by a category column. Each group is resampled by a fixed time frequency, such as every 2 days. For each resample period, we calculate an aggregate like the mean of values. Some periods may have no data, resulting in NaN. After resampling all groups, results are combined into one DataFrame. Resetting the index flattens the multi-index created by groupby and resample. This method helps analyze time-based data separately for each group, like sales per region over time. The execution table shows each step, including grouping, resampling periods, values considered, and output rows. Variable tracking shows how the DataFrame and result evolve. Key moments clarify why NaNs appear, why resetting index is needed, and how resample aligns periods per group. The quiz tests understanding of mean values, resample steps, and effects of changing frequency.