Challenge - 5 Problems

🎖️

Resampling Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of resampling with groupby on time data

What is the output of this code snippet that groups data by 'category' and resamples the time series to daily frequency, summing the values?

Pandas

import pandas as pd

data = {
    'category': ['A', 'A', 'B', 'B', 'A', 'B'],
    'value': [10, 20, 30, 40, 50, 60],
    'date': pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-01', '2024-01-03', '2024-01-03', '2024-01-02'])
}
df = pd.DataFrame(data)
df.set_index('date', inplace=True)
result = df.groupby('category').resample('D').sum()
print(result)

category  date      value
category date             
A        2024-01-01     10
         2024-01-02     20
         2024-01-03     50
B        2024-01-01     30
         2024-01-02     60
         2024-01-03     40

category  date      value
A         2024-01-01     10
          2024-01-02     20
          2024-01-03     0
B         2024-01-01     30
          2024-01-02     60
          2024-01-03     40

category  date      value
A         2024-01-01     10
          2024-01-02     20
          2024-01-03     50
B         2024-01-01     30
          2024-01-02      0
          2024-01-03     40

category  date      value
A         2024-01-01     10
          2024-01-02     20
          2024-01-03     50
B         2024-01-01     30
          2024-01-02     60
          2024-01-03      0

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Number of rows after groupby and resample

Given a DataFrame with 3 categories and dates spanning 4 days, what is the number of rows in the result after grouping by 'category' and resampling daily?

Pandas

import pandas as pd
import numpy as np

np.random.seed(0)
categories = ['X', 'Y', 'Z']
dates = pd.date_range('2024-01-01', periods=4)
data = {'category': np.repeat(categories, 4), 'value': np.random.randint(1, 10, 12), 'date': list(dates)*3}
df = pd.DataFrame(data).set_index('date')
result = df.groupby('category').resample('D').sum()

B15

C12

D16

Attempts:

2 left

🔧 Debug

advanced

1:30remaining

Identify the error in resampling after groupby

What error does this code produce when trying to resample after grouping by 'category'?

Pandas

import pandas as pd

data = {'category': ['A', 'A'], 'value': [1, 2], 'date': ['2024-01-01', '2024-01-02']}
df = pd.DataFrame(data)
df.set_index('date', inplace=True)
df.groupby('category').resample('D').mean()

AAttributeError: 'DataFrameGroupBy' object has no attribute 'resample'

BKeyError: 'date'

CNo error, returns mean values

DTypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

Attempts:

2 left

🚀 Application

advanced

2:00remaining

Calculate weekly average sales per store

You have sales data with columns 'store', 'sales', and 'date'. How do you calculate the weekly average sales per store using groupby and resample?

Pandas

import pandas as pd

data = {'store': ['S1', 'S1', 'S2', 'S2', 'S1', 'S2'],
        'sales': [100, 150, 200, 250, 130, 300],
        'date': pd.to_datetime(['2024-01-01', '2024-01-08', '2024-01-01', '2024-01-08', '2024-01-15', '2024-01-15'])}
df = pd.DataFrame(data).set_index('date')

Adf.groupby('store').resample('W').mean()

Bdf.resample('W').groupby('store').mean()

Cdf.groupby('store').resample('W').sum()

Ddf.groupby('store').resample('D').mean()

Attempts:

2 left

🧠 Conceptual

expert

2:30remaining

Why use groupby before resample for time series data?

Why is it important to use groupby before resample when working with time series data that has multiple categories?

ABecause resample automatically groups data by all columns if groupby is not used.

BBecause resample only works on the index, so grouping first allows resampling within each category's time series separately.

CBecause resample can only be applied to columns, so groupby moves the index to columns.

DBecause groupby changes the data type of the index to datetime, enabling resample to work.

Attempts:

2 left