0
0
Pandasdata~20 mins

Resampling with groupby for time data in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Resampling Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of resampling with groupby on time data
What is the output of this code snippet that groups data by 'category' and resamples the time series to daily frequency, summing the values?
Pandas
import pandas as pd

data = {
    'category': ['A', 'A', 'B', 'B', 'A', 'B'],
    'value': [10, 20, 30, 40, 50, 60],
    'date': pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-01', '2024-01-03', '2024-01-03', '2024-01-02'])
}
df = pd.DataFrame(data)
df.set_index('date', inplace=True)
result = df.groupby('category').resample('D').sum()
print(result)
A
category  date      value
category date             
A        2024-01-01     10
         2024-01-02     20
         2024-01-03     50
B        2024-01-01     30
         2024-01-02     60
         2024-01-03     40
B
category  date      value
A         2024-01-01     10
          2024-01-02     20
          2024-01-03     0
B         2024-01-01     30
          2024-01-02     60
          2024-01-03     40
C
category  date      value
A         2024-01-01     10
          2024-01-02     20
          2024-01-03     50
B         2024-01-01     30
          2024-01-02      0
          2024-01-03     40
D
category  date      value
A         2024-01-01     10
          2024-01-02     20
          2024-01-03     50
B         2024-01-01     30
          2024-01-02     60
          2024-01-03      0
Attempts:
2 left
💡 Hint
Remember that resample fills missing dates with zeros only if you use .sum() after resampling on groups.
data_output
intermediate
1:30remaining
Number of rows after groupby and resample
Given a DataFrame with 3 categories and dates spanning 4 days, what is the number of rows in the result after grouping by 'category' and resampling daily?
Pandas
import pandas as pd
import numpy as np

np.random.seed(0)
categories = ['X', 'Y', 'Z']
dates = pd.date_range('2024-01-01', periods=4)
data = {'category': np.repeat(categories, 4), 'value': np.random.randint(1, 10, 12), 'date': list(dates)*3}
df = pd.DataFrame(data).set_index('date')
result = df.groupby('category').resample('D').sum()
A9
B15
C12
D16
Attempts:
2 left
💡 Hint
Each category has 4 days, so total rows = categories * days.
🔧 Debug
advanced
1:30remaining
Identify the error in resampling after groupby
What error does this code produce when trying to resample after grouping by 'category'?
Pandas
import pandas as pd

data = {'category': ['A', 'A'], 'value': [1, 2], 'date': ['2024-01-01', '2024-01-02']}
df = pd.DataFrame(data)
df.set_index('date', inplace=True)
df.groupby('category').resample('D').mean()
AAttributeError: 'DataFrameGroupBy' object has no attribute 'resample'
BKeyError: 'date'
CNo error, returns mean values
DTypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
Attempts:
2 left
💡 Hint
Check the index type after setting it.
🚀 Application
advanced
2:00remaining
Calculate weekly average sales per store
You have sales data with columns 'store', 'sales', and 'date'. How do you calculate the weekly average sales per store using groupby and resample?
Pandas
import pandas as pd

data = {'store': ['S1', 'S1', 'S2', 'S2', 'S1', 'S2'],
        'sales': [100, 150, 200, 250, 130, 300],
        'date': pd.to_datetime(['2024-01-01', '2024-01-08', '2024-01-01', '2024-01-08', '2024-01-15', '2024-01-15'])}
df = pd.DataFrame(data).set_index('date')
Adf.groupby('store').resample('W').mean()
Bdf.resample('W').groupby('store').mean()
Cdf.groupby('store').resample('W').sum()
Ddf.groupby('store').resample('D').mean()
Attempts:
2 left
💡 Hint
Resample should be called after groupby to apply on each group separately.
🧠 Conceptual
expert
2:30remaining
Why use groupby before resample for time series data?
Why is it important to use groupby before resample when working with time series data that has multiple categories?
ABecause resample automatically groups data by all columns if groupby is not used.
BBecause resample only works on the index, so grouping first allows resampling within each category's time series separately.
CBecause resample can only be applied to columns, so groupby moves the index to columns.
DBecause groupby changes the data type of the index to datetime, enabling resample to work.
Attempts:
2 left
💡 Hint
Think about how resample works on time indexes and how grouping affects that.