Challenge - 5 Problems
Master of merge() Joins
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of inner join with merge()
What is the output DataFrame after performing an inner join using
merge() on these two DataFrames by column key?Data Analysis Python
import pandas as pd df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'val1': [1, 2, 3]}) df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'val2': [4, 5, 6]}) result = pd.merge(df1, df2, on='key', how='inner') print(result)
Attempts:
2 left
💡 Hint
Inner join keeps only rows with keys present in both DataFrames.
✗ Incorrect
An inner join returns rows where the key exists in both DataFrames. Here, keys 'B' and 'C' are common, so only those rows appear with their respective values.
❓ data_output
intermediate1:30remaining
Number of rows after left join
After performing a left join with
merge() on these DataFrames by id, how many rows will the resulting DataFrame have?Data Analysis Python
import pandas as pd df_left = pd.DataFrame({'id': [1, 2, 3, 4], 'val_left': ['a', 'b', 'c', 'd']}) df_right = pd.DataFrame({'id': [3, 4, 5], 'val_right': ['x', 'y', 'z']}) result = pd.merge(df_left, df_right, on='id', how='left')
Attempts:
2 left
💡 Hint
Left join keeps all rows from the left DataFrame.
✗ Incorrect
A left join keeps all rows from the left DataFrame regardless of matches in the right. Since df_left has 4 rows, the result has 4 rows.
🔧 Debug
advanced2:00remaining
Identify the error in merge() usage
What error will this code raise when trying to merge these DataFrames?
Data Analysis Python
import pandas as pd df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) df2 = pd.DataFrame({'C': [1, 2], 'D': [5, 6]}) result = pd.merge(df1, df2, on='A')
Attempts:
2 left
💡 Hint
Check if the column used for merging exists in both DataFrames.
✗ Incorrect
Column 'A' exists only in df1, not in df2. Trying to merge on 'A' causes a ValueError.
❓ visualization
advanced2:30remaining
Visualize the result of an outer join
Which option shows the correct DataFrame output after performing an outer join on these DataFrames by
key?Data Analysis Python
import pandas as pd df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2'], 'val1': [1, 2, 3]}) df2 = pd.DataFrame({'key': ['K1', 'K2', 'K3'], 'val2': [4, 5, 6]}) result = pd.merge(df1, df2, on='key', how='outer').sort_values('key').reset_index(drop=True) print(result)
Attempts:
2 left
💡 Hint
Outer join keeps all keys from both DataFrames, filling missing values with NaN.
✗ Incorrect
Outer join includes all keys from both DataFrames. Keys K0 and K3 appear only in one DataFrame, so their missing values are NaN.
🚀 Application
expert3:00remaining
Combine DataFrames with multiple keys and suffixes
Given these DataFrames, what is the output of merging on columns
city and year with suffixes _left and _right?Data Analysis Python
import pandas as pd df1 = pd.DataFrame({ 'city': ['NY', 'LA', 'NY', 'LA'], 'year': [2020, 2020, 2021, 2021], 'pop': [8.3, 4.0, 8.4, 4.1] }) df2 = pd.DataFrame({ 'city': ['NY', 'LA', 'NY', 'LA'], 'year': [2020, 2020, 2021, 2021], 'pop': [8.5, 4.1, 8.6, 4.2] }) result = pd.merge(df1, df2, on=['city', 'year'], suffixes=('_left', '_right')) print(result)
Attempts:
2 left
💡 Hint
Suffixes are added to overlapping column names except the keys.
✗ Incorrect
Since both DataFrames have a 'pop' column, suffixes '_left' and '_right' are added to distinguish them in the merged DataFrame.