Challenge - 5 Problems

🎖️

Left Join Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ query_result

intermediate

2:00remaining

What is the output of this left join?

Given two pandas DataFrames df1 and df2:

df1 = pd.DataFrame({"key": [1, 2, 3], "val1": ["A", "B", "C"]})
df2 = pd.DataFrame({"key": [2, 3, 4], "val2": ["X", "Y", "Z"]})

result = df1.merge(df2, on="key", how="left")
print(result)

What will be printed?

Pandas

import pandas as pd
df1 = pd.DataFrame({"key": [1, 2, 3], "val1": ["A", "B", "C"]})
df2 = pd.DataFrame({"key": [2, 3, 4], "val2": ["X", "Y", "Z"]})

result = df1.merge(df2, on="key", how="left")
print(result)

   key val1 val2
0    1    A    Z
1    2    B    X
2    3    C    Y

   key val1 val2
0    1    A    X
1    2    B    Y
2    3    C    Z

   key val1 val2
0    1    A  NaN
1    2    B  NaN
2    3    C  NaN

   key val1 val2
0    1    A  NaN
1    2    B    X
2    3    C    Y

Attempts:

2 left

🧠 Conceptual

intermediate

1:30remaining

What happens to unmatched rows in a left join?

When performing a left join between two tables, what happens to rows in the left table that have no matching key in the right table?

AThey appear in the result with NULL or NaN values for the right table's columns.

BThey are dropped from the result.

CThey cause an error and stop the join.

DThey are duplicated for every row in the right table.

Attempts:

2 left

📝 Syntax

advanced

2:00remaining

Which code correctly performs a left join on columns with different names?

You have two DataFrames:

df1 = pd.DataFrame({"id": [1, 2], "val1": ["A", "B"]})
df2 = pd.DataFrame({"key": [2, 3], "val2": ["X", "Y"]})

Which code correctly performs a left join where df1.id matches df2.key?

Adf1.merge(df2, on=["id", "key"], how="left")

Bdf1.merge(df2, left_on="id", right_on="key", how="left")

Cdf1.merge(df2, on="id", how="left")

Ddf1.merge(df2, left_index=True, right_index=True, how="left")

Attempts:

2 left

❓ optimization

advanced

2:30remaining

How to optimize a left join on large DataFrames?

You have two very large DataFrames and want to perform a left join on a key column. Which approach can improve performance?

ASet the key columns as indexes before joining using <code>set_index</code>.

BConvert the DataFrames to lists and join manually with loops.

CUse <code>concat</code> instead of <code>merge</code>.

DDrop all columns except the key before joining.

Attempts:

2 left

🔧 Debug

expert

3:00remaining

Why does this left join produce unexpected duplicated rows?

Given:

df1 = pd.DataFrame({"key": [1, 2, 2], "val1": ["A", "B", "C"]})
df2 = pd.DataFrame({"key": [2], "val2": ["X"]})

result = df1.merge(df2, on="key", how="left")
print(result)

The output is:

   key val1 val2
0    1    A  NaN
1    2    B    X
2    2    C    X

Why are there two rows with key=2 in the result?

ABecause the merge function was called with wrong parameters.

BBecause df2 has duplicate keys, causing multiple matches.

CBecause df1 has duplicate keys, the join repeats matching rows for each duplicate.

DBecause left join removes duplicates automatically.

Attempts:

2 left