Challenge - 5 Problems
Merge Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of merging two DataFrames on different column names
What is the output DataFrame after merging df1 and df2 on df1's 'key1' and df2's 'key2' columns?
Pandas
import pandas as pd df1 = pd.DataFrame({'key1': ['A', 'B', 'C'], 'val1': [1, 2, 3]}) df2 = pd.DataFrame({'key2': ['B', 'C', 'D'], 'val2': [4, 5, 6]}) result = pd.merge(df1, df2, left_on='key1', right_on='key2') print(result)
Attempts:
2 left
💡 Hint
Remember that merging on different column names requires specifying left_on and right_on parameters.
✗ Incorrect
The merge matches rows where df1's 'key1' equals df2's 'key2'. Only 'B' and 'C' appear in both, so only those rows merge.
❓ data_output
intermediate1:30remaining
Number of rows after merging on different column names with inner join
How many rows will the resulting DataFrame have after this merge operation?
Pandas
import pandas as pd df1 = pd.DataFrame({'id1': [1, 2, 3, 4], 'value1': ['a', 'b', 'c', 'd']}) df2 = pd.DataFrame({'id2': [3, 4, 5, 6], 'value2': ['x', 'y', 'z', 'w']}) merged = pd.merge(df1, df2, left_on='id1', right_on='id2', how='inner') print(len(merged))
Attempts:
2 left
💡 Hint
Inner join keeps only rows with matching keys in both DataFrames.
✗ Incorrect
Only ids 3 and 4 appear in both DataFrames, so the merged DataFrame has 2 rows.
🔧 Debug
advanced1:30remaining
Identify the error in merging on different column names
What error will this code raise when trying to merge df1 and df2?
Pandas
import pandas as pd df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) df2 = pd.DataFrame({'C': [1, 2], 'D': [5, 6]}) result = pd.merge(df1, df2, on='A')
Attempts:
2 left
💡 Hint
Check if both DataFrames have the column named in 'on' parameter.
✗ Incorrect
df2 does not have column 'A', so merge raises a KeyError.
🚀 Application
advanced2:00remaining
Merging DataFrames with different column names and suffixes
After merging df1 and df2 on different column names, what will be the column names in the result?
Pandas
import pandas as pd df1 = pd.DataFrame({'user_id': [1, 2], 'score': [10, 20]}) df2 = pd.DataFrame({'id': [1, 3], 'score': [15, 25]}) result = pd.merge(df1, df2, left_on='user_id', right_on='id', suffixes=('_left', '_right')) print(result.columns.tolist())
Attempts:
2 left
💡 Hint
Suffixes are added to overlapping column names except the join keys.
✗ Incorrect
Both DataFrames have 'score' column. After merge, suffixes '_left' and '_right' are added to distinguish them.
🧠 Conceptual
expert2:30remaining
Choosing merge parameters for different column names
You have two DataFrames: dfA with column 'emp_id' and dfB with column 'employee'. You want to merge them to keep all rows from dfA and matching rows from dfB. Which merge parameters are correct?
Attempts:
2 left
💡 Hint
Use left_on and right_on when column names differ. 'how'='left' keeps all rows from left DataFrame.
✗ Incorrect
Option D correctly specifies different column names and uses left join to keep all rows from dfA.