Challenge - 5 Problems
Grouping Data Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of GroupBy Sum Aggregation
What is the output of this code that groups sales by product and sums the amounts?
Pandas
import pandas as pd data = {'product': ['apple', 'banana', 'apple', 'banana', 'apple'], 'amount': [10, 5, 15, 10, 5]} df = pd.DataFrame(data) result = df.groupby('product')['amount'].sum() print(result)
Attempts:
2 left
💡 Hint
Sum all amounts for each product separately.
✗ Incorrect
The code groups the data by 'product' and sums the 'amount' values for each group. Apple has 10+15+5=30, banana has 5+10=15.
❓ data_output
intermediate1:30remaining
Number of Groups Created by GroupBy
How many groups are created when grouping this DataFrame by the 'category' column?
Pandas
import pandas as pd data = {'category': ['A', 'B', 'A', 'C', 'B', 'C', 'C'], 'value': [1, 2, 3, 4, 5, 6, 7]} df = pd.DataFrame(data) groups = df.groupby('category') print(len(groups))
Attempts:
2 left
💡 Hint
Count unique values in 'category'.
✗ Incorrect
There are three unique categories: A, B, and C, so three groups are created.
❓ visualization
advanced2:30remaining
Visualizing Grouped Data with Mean Values
Which plot correctly shows the average score per class from the grouped data?
Pandas
import pandas as pd import matplotlib.pyplot as plt data = {'class': ['X', 'Y', 'X', 'Y', 'X'], 'score': [80, 90, 70, 85, 75]} df = pd.DataFrame(data) grouped = df.groupby('class')['score'].mean() plt.bar(grouped.index, grouped.values) plt.xlabel('Class') plt.ylabel('Average Score') plt.title('Average Score per Class') plt.show()
Attempts:
2 left
💡 Hint
Mean scores are 75 for X and 87.5 for Y.
✗ Incorrect
The code groups by class and calculates mean scores, then plots a bar chart with these means.
🔧 Debug
advanced1:30remaining
Identify the Error in GroupBy Usage
What error does this code raise when grouping by a non-existent column?
Pandas
import pandas as pd data = {'name': ['Anna', 'Bob'], 'age': [25, 30]} df = pd.DataFrame(data) result = df.groupby('salary').mean()
Attempts:
2 left
💡 Hint
Check if the column exists in the DataFrame.
✗ Incorrect
The code tries to group by 'salary' which is not a column in the DataFrame, causing a KeyError.
🚀 Application
expert2:30remaining
Calculate Total Sales per Region and Month
Given sales data with 'region', 'month', and 'sales' columns, which code correctly groups by region and month and sums sales?
Pandas
import pandas as pd data = {'region': ['North', 'South', 'North', 'South', 'North'], 'month': ['Jan', 'Jan', 'Feb', 'Feb', 'Jan'], 'sales': [100, 200, 150, 250, 50]} df = pd.DataFrame(data)
Attempts:
2 left
💡 Hint
Group by both columns using a list, then select 'sales' to sum.
✗ Incorrect
Option C correctly groups by both 'region' and 'month' using a list and sums the 'sales' column.