Challenge - 5 Problems
Master of Combining Datasets
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of merging two datasets with an inner join?
Consider two data tables representing sales data and customer data. What will be the result of merging them using an inner join on the 'customer_id' column?
Data Analysis Python
import pandas as pd sales = pd.DataFrame({'customer_id': [1, 2, 3], 'amount': [100, 200, 300]}) customers = pd.DataFrame({'customer_id': [2, 3, 4], 'name': ['Alice', 'Bob', 'Charlie']}) merged = pd.merge(sales, customers, on='customer_id', how='inner') print(merged)
Attempts:
2 left
💡 Hint
Inner join keeps only rows with matching keys in both tables.
✗ Incorrect
An inner join returns only rows where the 'customer_id' exists in both datasets. Here, customer_id 2 and 3 are common, so only those rows appear.
❓ data_output
intermediate1:30remaining
How many rows result from a left join combining product and review data?
Given product data and review data, if you perform a left join on 'product_id', how many rows will the resulting dataset have?
Data Analysis Python
import pandas as pd products = pd.DataFrame({'product_id': [101, 102, 103], 'product_name': ['Pen', 'Pencil', 'Eraser']}) reviews = pd.DataFrame({'product_id': [101, 101, 104], 'review': ['Good', 'Excellent', 'Bad']}) left_joined = pd.merge(products, reviews, on='product_id', how='left') print(len(left_joined))
Attempts:
2 left
💡 Hint
Left join keeps all rows from the left table.
✗ Incorrect
Left join keeps all 3 products. Reviews for product_id 101 appear twice, but since only one product row exists per product_id, the join expands rows accordingly. Here, product_id 101 has two reviews, so it creates two rows for that product, resulting in 4 rows total.
❓ visualization
advanced2:30remaining
Which plot best shows the relationship after combining sales and region data?
You combined sales data with region data to analyze sales by region. Which plot below best visualizes total sales per region?
Data Analysis Python
import pandas as pd import matplotlib.pyplot as plt sales = pd.DataFrame({'region': ['North', 'South', 'North', 'East'], 'sales': [100, 150, 200, 130]}) total_sales = sales.groupby('region').sum().reset_index() plt.bar(total_sales['region'], total_sales['sales']) plt.xlabel('Region') plt.ylabel('Total Sales') plt.title('Total Sales by Region') plt.show()
Attempts:
2 left
💡 Hint
Bar charts are good for comparing totals across categories.
✗ Incorrect
The bar chart clearly shows total sales per region, which is the goal after combining sales and region data.
🔧 Debug
advanced2:00remaining
What error occurs when merging datasets with missing keys without specifying join type?
What error will this code raise when merging two datasets with no common keys and no join type specified?
Data Analysis Python
import pandas as pd left = pd.DataFrame({'id': [1, 2]}) right = pd.DataFrame({'key': [3, 4]}) merged = pd.merge(left, right, on='id')
Attempts:
2 left
💡 Hint
Check if the columns used for merging exist in both datasets.
✗ Incorrect
The right DataFrame does not have the 'id' column, so pandas raises a KeyError: 'id'.
🚀 Application
expert3:00remaining
How does combining datasets improve data analysis completeness?
You have customer purchase data and customer support data in separate tables. Which reason best explains why combining these datasets helps create a complete picture?
Attempts:
2 left
💡 Hint
Think about how different data sources add context to each other.
✗ Incorrect
Combining purchase and support data lets analysts see how buying habits relate to support needs, giving a fuller understanding of customers.