0
0
Data Analysis Pythondata~20 mins

merge() for SQL-style joins in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Master of merge() Joins
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of inner join with merge()
What is the output DataFrame after performing an inner join using merge() on these two DataFrames by column key?
Data Analysis Python
import pandas as pd

df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'val1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'val2': [4, 5, 6]})

result = pd.merge(df1, df2, on='key', how='inner')
print(result)
A
  key  val1  val2
0   A     1     4
1   B     2     5
2   C     3     6
B
  key  val1  val2
0   B     2     4
1   C     3     5
C
  key  val1  val2
0   B     2     5
1   C     3     6
2   D   NaN     6
D
  key  val1  val2
0   A     1   NaN
1   B     2   NaN
2   C     3   NaN
Attempts:
2 left
💡 Hint
Inner join keeps only rows with keys present in both DataFrames.
data_output
intermediate
1:30remaining
Number of rows after left join
After performing a left join with merge() on these DataFrames by id, how many rows will the resulting DataFrame have?
Data Analysis Python
import pandas as pd

df_left = pd.DataFrame({'id': [1, 2, 3, 4], 'val_left': ['a', 'b', 'c', 'd']})
df_right = pd.DataFrame({'id': [3, 4, 5], 'val_right': ['x', 'y', 'z']})

result = pd.merge(df_left, df_right, on='id', how='left')
A4
B3
C5
D7
Attempts:
2 left
💡 Hint
Left join keeps all rows from the left DataFrame.
🔧 Debug
advanced
2:00remaining
Identify the error in merge() usage
What error will this code raise when trying to merge these DataFrames?
Data Analysis Python
import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'C': [1, 2], 'D': [5, 6]})

result = pd.merge(df1, df2, on='A')
AValueError: No common columns to perform merge on
BNo error, merge runs successfully
CTypeError: merge() missing 1 required positional argument
DKeyError: 'A'
Attempts:
2 left
💡 Hint
Check if the column used for merging exists in both DataFrames.
visualization
advanced
2:30remaining
Visualize the result of an outer join
Which option shows the correct DataFrame output after performing an outer join on these DataFrames by key?
Data Analysis Python
import pandas as pd

df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2'], 'val1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['K1', 'K2', 'K3'], 'val2': [4, 5, 6]})

result = pd.merge(df1, df2, on='key', how='outer').sort_values('key').reset_index(drop=True)
print(result)
A
  key  val1  val2
0  K0   NaN   NaN
1  K1   2.0   4.0
2  K2   3.0   5.0
3  K3   6.0   NaN
B
  key  val1  val2
0  K1   2.0   4.0
1  K2   3.0   5.0
C
  key  val1  val2
0  K0   1.0   NaN
1  K1   2.0   4.0
2  K2   3.0   5.0
3  K3   NaN   6.0
D
  key  val1  val2
0  K0   1.0   4.0
1  K1   2.0   5.0
2  K2   3.0   6.0
Attempts:
2 left
💡 Hint
Outer join keeps all keys from both DataFrames, filling missing values with NaN.
🚀 Application
expert
3:00remaining
Combine DataFrames with multiple keys and suffixes
Given these DataFrames, what is the output of merging on columns city and year with suffixes _left and _right?
Data Analysis Python
import pandas as pd

df1 = pd.DataFrame({
    'city': ['NY', 'LA', 'NY', 'LA'],
    'year': [2020, 2020, 2021, 2021],
    'pop': [8.3, 4.0, 8.4, 4.1]
})

df2 = pd.DataFrame({
    'city': ['NY', 'LA', 'NY', 'LA'],
    'year': [2020, 2020, 2021, 2021],
    'pop': [8.5, 4.1, 8.6, 4.2]
})

result = pd.merge(df1, df2, on=['city', 'year'], suffixes=('_left', '_right'))
print(result)
A
  city  year  pop_left
0   NY  2020       8.3
1   LA  2020       4.0
2   NY  2021       8.4
3   LA  2021       4.1
B
  city  year  pop
0   NY  2020  8.5
1   LA  2020  4.1
2   NY  2021  8.6
3   LA  2021  4.2
C
  city  year  pop_left  pop_right
0   NY  2020       8.5        8.3
1   LA  2020       4.1        4.0
2   NY  2021       8.6        8.4
3   LA  2021       4.2        4.1
D
  city  year  pop_left  pop_right
0   NY  2020       8.3        8.5
1   LA  2020       4.0        4.1
2   NY  2021       8.4        8.6
3   LA  2021       4.1        4.2
Attempts:
2 left
💡 Hint
Suffixes are added to overlapping column names except the keys.