Challenge - 5 Problems

🎖️

Master of merge() for SQL-like joins

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of inner join with pandas.merge()

What is the output DataFrame after performing an inner join on the 'key' column?

Pandas

import pandas as pd

df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'val1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'val2': [4, 5, 6]})

result = pd.merge(df1, df2, on='key', how='inner')
print(result)

A{'key': ['A', 'B', 'C', 'D'], 'val1': [1, 2, 3, null], 'val2': [null, 4, 5, 6]}

B{'key': ['B', 'C', 'D'], 'val1': [2, 3, null], 'val2': [4, 5, 6]}

C{'key': ['B', 'C'], 'val1': [2, 3], 'val2': [4, 5]}

D{'key': ['A', 'B', 'C'], 'val1': [1, 2, 3], 'val2': [null, 4, 5]}

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Number of rows after left join with pandas.merge()

How many rows will the resulting DataFrame have after performing a left join on 'key'?

Pandas

import pandas as pd

df1 = pd.DataFrame({'key': ['X', 'Y', 'Z'], 'val1': [10, 20, 30]})
df2 = pd.DataFrame({'key': ['Y', 'Z'], 'val2': [100, 200]})

result = pd.merge(df1, df2, on='key', how='left')
print(len(result))

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in merge() usage

What error will this code raise?

Pandas

import pandas as pd

df1 = pd.DataFrame({'id': [1, 2], 'val': ['A', 'B']})
df2 = pd.DataFrame({'key': [1, 2], 'val2': ['X', 'Y']})

result = pd.merge(df1, df2, on='id')

AValueError: columns overlap but no suffix specified

BKeyError: 'id'

CTypeError: merge() missing 1 required positional argument

DNo error, merge runs successfully

Attempts:

2 left

❓ visualization

advanced

2:30remaining

Visualize the result of an outer join

Which option shows the correct DataFrame after an outer join on 'key'?

Pandas

import pandas as pd

df1 = pd.DataFrame({'key': ['A', 'B'], 'val1': [1, 2]})
df2 = pd.DataFrame({'key': ['B', 'C'], 'val2': [3, 4]})

result = pd.merge(df1, df2, on='key', how='outer').sort_values('key').reset_index(drop=True)
print(result)

  key  val1  val2
0   A   1.0   NaN
1   B   2.0   3.0
2   C   NaN   4.0

  key  val1  val2
0   A   NaN   NaN
1   B   2.0   3.0
2   C   NaN   4.0

  key  val1  val2
0   A   1.0   3.0
1   B   2.0   4.0
2   C   NaN   NaN

  key  val1  val2
0   A   1.0   4.0
1   B   2.0   3.0
2   C   NaN   NaN

Attempts:

2 left

🚀 Application

expert

3:00remaining

Combine DataFrames with multiple keys and suffixes

You have two DataFrames with overlapping column names besides the keys. Which code correctly merges them on 'id' and 'date' with suffixes to avoid column name conflicts?

Pandas

import pandas as pd

df1 = pd.DataFrame({'id': [1, 2], 'date': ['2023-01-01', '2023-01-02'], 'value': [10, 20]})
df2 = pd.DataFrame({'id': [1, 2], 'date': ['2023-01-01', '2023-01-02'], 'value': [100, 200]})

Apd.merge(df1, df2, on=['id', 'date'], suffixes=['_left', '_right'])

Bpd.merge(df1, df2, on=['id', 'date'], suffixes='_left_right')

Cpd.merge(df1, df2, on=['id', 'date'], suffixes=('left', 'right'))

Dpd.merge(df1, df2, on=['id', 'date'], suffixes=('_left', '_right'))

Attempts:

2 left