Challenge - 5 Problems

🎖️

Aggregation Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of group aggregation with multiple functions

What is the output of the following code snippet that groups data by 'Category' and calculates the mean and max of 'Value'?

Data Analysis Python

import pandas as pd

data = pd.DataFrame({
    'Category': ['A', 'A', 'B', 'B', 'C'],
    'Value': [10, 20, 10, 30, 40]
})

result = data.groupby('Category')['Value'].agg(['mean', 'max']).reset_index()
print(result)

A[{'Category': 'A', 'mean': 15.0, 'max': 20}, {'Category': 'B', 'mean': 20.0, 'max': 30}, {'Category': 'C', 'mean': 40.0, 'max': 40}]

B[{'Category': 'A', 'mean': 10.0, 'max': 20}, {'Category': 'B', 'mean': 20.0, 'max': 30}, {'Category': 'C', 'mean': 40.0, 'max': 40}]

C[{'Category': 'A', 'mean': 15.0, 'max': 10}, {'Category': 'B', 'mean': 20.0, 'max': 30}, {'Category': 'C', 'mean': 40.0, 'max': 40}]

D[{'Category': 'A', 'mean': 15.0, 'max': 20}, {'Category': 'B', 'mean': 20.0, 'max': 10}, {'Category': 'C', 'mean': 40.0, 'max': 40}]

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Number of unique users per product

Given a DataFrame with user purchases, what is the number of unique users who bought each product?

Data Analysis Python

import pandas as pd

data = pd.DataFrame({
    'User': ['Alice', 'Bob', 'Alice', 'David', 'Bob', 'Eve'],
    'Product': ['X', 'X', 'Y', 'Y', 'X', 'Z']
})

result = data.groupby('Product')['User'].nunique().reset_index(name='UniqueUsers')
print(result)

A[{'Product': 'X', 'UniqueUsers': 2}, {'Product': 'Y', 'UniqueUsers': 3}, {'Product': 'Z', 'UniqueUsers': 1}]

B[{'Product': 'X', 'UniqueUsers': 3}, {'Product': 'Y', 'UniqueUsers': 2}, {'Product': 'Z', 'UniqueUsers': 1}]

C[{'Product': 'X', 'UniqueUsers': 2}, {'Product': 'Y', 'UniqueUsers': 2}, {'Product': 'Z', 'UniqueUsers': 1}]

D[{'Product': 'X', 'UniqueUsers': 3}, {'Product': 'Y', 'UniqueUsers': 3}, {'Product': 'Z', 'UniqueUsers': 1}]

Attempts:

2 left

❓ visualization

advanced

2:30remaining

Visualizing average sales per region

Which plot correctly shows the average sales per region from the given data?

Data Analysis Python

import pandas as pd
import matplotlib.pyplot as plt

data = pd.DataFrame({
    'Region': ['North', 'South', 'East', 'West', 'North', 'South'],
    'Sales': [100, 150, 200, 130, 120, 170]
})
avg_sales = data.groupby('Region')['Sales'].mean().reset_index()

plt.bar(avg_sales['Region'], avg_sales['Sales'])
plt.title('Average Sales per Region')
plt.xlabel('Region')
plt.ylabel('Average Sales')
plt.show()

ABar chart with regions on x-axis and average sales on y-axis, bars heights: North=110, South=160, East=200, West=130

BLine chart with regions on x-axis and average sales on y-axis, points: North=110, South=160, East=200, West=130

CPie chart showing percentage sales per region with values: North=220, South=320, East=200, West=130

DScatter plot with sales values plotted against region names

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in aggregation code

What error will the following code raise when executed?

Data Analysis Python

import pandas as pd

data = pd.DataFrame({
    'Category': ['A', 'B', 'A'],
    'Value': [10, 20, 30]
})

result = data.groupby('Category')['Value'].agg('sum', 'mean')
print(result)

ANo error, outputs sum and mean aggregation

BTypeError: agg() takes from 1 to 2 positional arguments but 3 were given

CKeyError: 'mean'

DValueError: Function names must be passed as a list or tuple

Attempts:

2 left

🚀 Application

expert

3:00remaining

Calculate weighted average feature per group

You have a DataFrame with 'Group', 'Value', and 'Weight' columns. Which code correctly calculates the weighted average of 'Value' per 'Group' using 'Weight' as weights?

Data Analysis Python

import pandas as pd

data = pd.DataFrame({
    'Group': ['X', 'X', 'Y', 'Y', 'Y'],
    'Value': [10, 20, 30, 40, 50],
    'Weight': [1, 3, 2, 1, 1]
})

Adata.groupby('Group').agg({'Value': 'mean'}).reset_index()

Bdata.groupby('Group').apply(lambda x: x['Value'].mean() * x['Weight'].mean()).reset_index(name='WeightedAvg')

Cdata.groupby('Group').agg({'Value': lambda x: (x * data['Weight']).sum() / data['Weight'].sum()}).reset_index()

Ddata.groupby('Group').apply(lambda x: (x['Value'] * x['Weight']).sum() / x['Weight'].sum()).reset_index(name='WeightedAvg')

Attempts:

2 left