Challenge - 5 Problems

🎖️

Outer Join Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ query_result

intermediate

2:00remaining

What is the output of this outer join?

Given two pandas DataFrames df1 and df2:

df1 = pd.DataFrame({"key": [1, 2, 3], "val1": ["A", "B", "C"]})
df2 = pd.DataFrame({"key": [2, 3, 4], "val2": ["X", "Y", "Z"]})

What is the result of pd.merge(df1, df2, on='key', how='outer')?

Pandas

import pandas as pd
df1 = pd.DataFrame({"key": [1, 2, 3], "val1": ["A", "B", "C"]})
df2 = pd.DataFrame({"key": [2, 3, 4], "val2": ["X", "Y", "Z"]})
result = pd.merge(df1, df2, on='key', how='outer')
print(result.sort_values('key').reset_index(drop=True))

A[{'key': 1, 'val1': 'A', 'val2': 'X'}, {'key': 2, 'val1': 'B', 'val2': 'Y'}, {'key': 3, 'val1': 'C', 'val2': 'Z'}]

B[{'key': 1, 'val1': 'A', 'val2': None}, {'key': 2, 'val1': 'B', 'val2': 'X'}, {'key': 3, 'val1': 'C', 'val2': 'Y'}, {'key': 4, 'val1': None, 'val2': 'Z'}]

C[{'key': 2, 'val1': 'B', 'val2': 'X'}, {'key': 3, 'val1': 'C', 'val2': 'Y'}]

D[{'key': 1, 'val1': 'A', 'val2': 'X'}, {'key': 4, 'val1': None, 'val2': 'Z'}]

Attempts:

2 left

❓ query_result

intermediate

2:00remaining

How many rows after outer join with duplicate keys?

Consider these DataFrames with duplicate keys:

df1 = pd.DataFrame({"key": [1, 2, 2], "val1": ["A", "B", "C"]})
df2 = pd.DataFrame({"key": [2, 2, 3], "val2": ["X", "Y", "Z"]})

How many rows will pd.merge(df1, df2, on='key', how='outer') have?

Pandas

import pandas as pd
df1 = pd.DataFrame({"key": [1, 2, 2], "val1": ["A", "B", "C"]})
df2 = pd.DataFrame({"key": [2, 2, 3], "val2": ["X", "Y", "Z"]})
result = pd.merge(df1, df2, on='key', how='outer')
print(len(result))

Attempts:

2 left

📝 Syntax

advanced

2:00remaining

Which code snippet correctly performs an outer join on two DataFrames?

Choose the code that correctly performs an outer join on df1 and df2 using pandas.

Apd.merge(df1, df2, on='key', how='outer')

Bdf1.merge(df2, how='outer')

Cpd.concat([df1, df2], join='outer')

Ddf1.join(df2, how='outer')

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Why does this outer join produce unexpected NaNs?

Given these DataFrames:

df1 = pd.DataFrame({"key": [1, 2], "val1": ["A", "B"]})
df2 = pd.DataFrame({"key": ["1", "2"], "val2": ["X", "Y"]})

Why does pd.merge(df1, df2, on='key', how='outer') produce NaNs in all columns except the keys?

Pandas

import pandas as pd
df1 = pd.DataFrame({"key": [1, 2], "val1": ["A", "B"]})
df2 = pd.DataFrame({"key": ["1", "2"], "val2": ["X", "Y"]})
result = pd.merge(df1, df2, on='key', how='outer')
print(result)

ABecause the key columns have different data types (int vs string), so no matches occur.

BBecause outer join only keeps rows with matching keys, so unmatched rows get NaN.

CBecause the DataFrames have different column names, causing merge to fail silently.

DBecause the merge function requires specifying left_on and right_on for different column names.

Attempts:

2 left

🧠 Conceptual

expert

2:00remaining

What is the effect of outer join on index alignment in pandas?

When performing an outer join with pd.merge on two DataFrames with overlapping and non-overlapping keys, how does pandas handle the index of the resulting DataFrame?

AThe resulting DataFrame uses a MultiIndex combining indexes from both DataFrames.

BThe resulting DataFrame preserves the index from the left DataFrame only.

CThe resulting DataFrame preserves the index from the right DataFrame only.

DThe resulting DataFrame has a new default integer index from 0 to n-1, ignoring original indexes.

Attempts:

2 left