Challenge - 5 Problems
Resampling Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of resampling with groupby on time data
What is the output of this code snippet that groups data by 'category' and resamples the time series to daily frequency, summing the values?
Pandas
import pandas as pd data = { 'category': ['A', 'A', 'B', 'B', 'A', 'B'], 'value': [10, 20, 30, 40, 50, 60], 'date': pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-01', '2024-01-03', '2024-01-03', '2024-01-02']) } df = pd.DataFrame(data) df.set_index('date', inplace=True) result = df.groupby('category').resample('D').sum() print(result)
Attempts:
2 left
💡 Hint
Remember that resample fills missing dates with zeros only if you use .sum() after resampling on groups.
✗ Incorrect
The groupby groups by 'category', then resample creates daily frequency for each group. The sum aggregates values on those dates. Missing dates for a group get zero sum, but since all dates exist for each group, sums are as expected.
❓ data_output
intermediate1:30remaining
Number of rows after groupby and resample
Given a DataFrame with 3 categories and dates spanning 4 days, what is the number of rows in the result after grouping by 'category' and resampling daily?
Pandas
import pandas as pd import numpy as np np.random.seed(0) categories = ['X', 'Y', 'Z'] dates = pd.date_range('2024-01-01', periods=4) data = {'category': np.repeat(categories, 4), 'value': np.random.randint(1, 10, 12), 'date': list(dates)*3} df = pd.DataFrame(data).set_index('date') result = df.groupby('category').resample('D').sum()
Attempts:
2 left
💡 Hint
Each category has 4 days, so total rows = categories * days.
✗ Incorrect
There are 3 categories and 4 days each, so 3 * 4 = 12 rows after resampling.
🔧 Debug
advanced1:30remaining
Identify the error in resampling after groupby
What error does this code produce when trying to resample after grouping by 'category'?
Pandas
import pandas as pd data = {'category': ['A', 'A'], 'value': [1, 2], 'date': ['2024-01-01', '2024-01-02']} df = pd.DataFrame(data) df.set_index('date', inplace=True) df.groupby('category').resample('D').mean()
Attempts:
2 left
💡 Hint
Check the index type after setting it.
✗ Incorrect
The index is string type, not datetime. Resample requires a datetime-like index, so it raises a TypeError.
🚀 Application
advanced2:00remaining
Calculate weekly average sales per store
You have sales data with columns 'store', 'sales', and 'date'. How do you calculate the weekly average sales per store using groupby and resample?
Pandas
import pandas as pd data = {'store': ['S1', 'S1', 'S2', 'S2', 'S1', 'S2'], 'sales': [100, 150, 200, 250, 130, 300], 'date': pd.to_datetime(['2024-01-01', '2024-01-08', '2024-01-01', '2024-01-08', '2024-01-15', '2024-01-15'])} df = pd.DataFrame(data).set_index('date')
Attempts:
2 left
💡 Hint
Resample should be called after groupby to apply on each group separately.
✗ Incorrect
Grouping by 'store' then resampling weekly ('W') and taking mean calculates weekly average sales per store.
🧠 Conceptual
expert2:30remaining
Why use groupby before resample for time series data?
Why is it important to use groupby before resample when working with time series data that has multiple categories?
Attempts:
2 left
💡 Hint
Think about how resample works on time indexes and how grouping affects that.
✗ Incorrect
Resample works on the time index. Grouping first splits data by category, so resample applies separately to each category's time series.