Challenge - 5 Problems
Master of Merging on Multiple Keys
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of merging two DataFrames on multiple keys
What is the output DataFrame after merging df1 and df2 on columns 'key1' and 'key2' using an inner join?
Data Analysis Python
import pandas as pd df1 = pd.DataFrame({ 'key1': ['A', 'B', 'C', 'A'], 'key2': [1, 2, 3, 2], 'value1': [10, 20, 30, 40] }) df2 = pd.DataFrame({ 'key1': ['A', 'B', 'A', 'D'], 'key2': [1, 2, 2, 4], 'value2': [100, 200, 300, 400] }) result = pd.merge(df1, df2, on=['key1', 'key2'], how='inner') print(result)
Attempts:
2 left
💡 Hint
Look for rows where both 'key1' and 'key2' match in both DataFrames.
✗ Incorrect
The inner merge keeps only rows where both keys match in df1 and df2. The matching pairs are ('A',1), ('B',2), and ('A',2). The values from both DataFrames are combined for these keys.
❓ data_output
intermediate1:30remaining
Number of rows after merging on multiple keys with outer join
After merging df1 and df2 on ['key1', 'key2'] using an outer join, how many rows does the resulting DataFrame have?
Data Analysis Python
import pandas as pd df1 = pd.DataFrame({ 'key1': ['X', 'Y', 'Z'], 'key2': [1, 2, 3], 'val1': [5, 10, 15] }) df2 = pd.DataFrame({ 'key1': ['X', 'Y', 'W'], 'key2': [1, 4, 3], 'val2': [50, 40, 30] }) merged = pd.merge(df1, df2, on=['key1', 'key2'], how='outer') print(len(merged))
Attempts:
2 left
💡 Hint
Count all unique pairs of keys from both DataFrames combined.
✗ Incorrect
The unique key pairs are ('X',1), ('Y',2), ('Z',3), ('Y',4), and ('W',3), totaling 5 unique pairs.
🔧 Debug
advanced1:30remaining
Identify the error in merging on multiple keys
What error will this code raise when trying to merge df1 and df2 on ['key1', 'key3']?
Data Analysis Python
import pandas as pd df1 = pd.DataFrame({ 'key1': ['A', 'B'], 'key2': [1, 2], 'value': [100, 200] }) df2 = pd.DataFrame({ 'key1': ['A', 'B'], 'key3': [1, 2], 'value': [300, 400] }) result = pd.merge(df1, df2, on=['key1', 'key3'])
Attempts:
2 left
💡 Hint
Check if both DataFrames have all columns specified in 'on'.
✗ Incorrect
df1 does not have a column named 'key3', so pandas raises a KeyError when trying to merge on it.
🚀 Application
advanced2:00remaining
Result of merging with suffixes on overlapping columns
What will be the output DataFrame after merging df1 and df2 on ['id', 'date'] with suffixes ('_left', '_right')?
Data Analysis Python
import pandas as pd df1 = pd.DataFrame({ 'id': [1, 2], 'date': ['2023-01-01', '2023-01-02'], 'value': [10, 20] }) df2 = pd.DataFrame({ 'id': [1, 2], 'date': ['2023-01-01', '2023-01-02'], 'value': [100, 200] }) merged = pd.merge(df1, df2, on=['id', 'date'], suffixes=('_left', '_right')) print(merged)
Attempts:
2 left
💡 Hint
Suffixes are added to overlapping column names except the keys.
✗ Incorrect
Since 'value' exists in both DataFrames and is not a key, suffixes '_left' and '_right' are added to distinguish them.
🧠 Conceptual
expert2:00remaining
Understanding merge behavior with duplicate keys
Given df1 and df2 below, how many rows will the merged DataFrame have after merging on ['key1', 'key2'] with an inner join?
Data Analysis Python
import pandas as pd df1 = pd.DataFrame({ 'key1': ['A', 'A'], 'key2': [1, 1], 'val1': [10, 20] }) df2 = pd.DataFrame({ 'key1': ['A', 'A'], 'key2': [1, 1], 'val2': [100, 200] }) merged = pd.merge(df1, df2, on=['key1', 'key2'], how='inner') print(len(merged))
Attempts:
2 left
💡 Hint
Think about how many combinations are formed when keys are duplicated in both DataFrames.
✗ Incorrect
Each row in df1 with key ('A',1) matches with each row in df2 with the same key, resulting in 2*2=4 rows in the merged DataFrame.