Challenge - 5 Problems

🎖️

Inner Join Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ query_result

intermediate

2:00remaining

What is the output of this inner join?

Given two dataframes df1 and df2, what is the result of the inner join on column key?

Pandas

import pandas as pd

df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'val1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'val2': [4, 5, 6]})

result = pd.merge(df1, df2, on='key', how='inner')
print(result)

A[{'key': 'A', 'val1': 1, 'val2': 4}, {'key': 'B', 'val1': 2, 'val2': 5}]

B[{'key': 'B', 'val1': 2, 'val2': 4}, {'key': 'C', 'val1': 3, 'val2': 5}]

C[{'key': 'A', 'val1': 1, 'val2': 6}, {'key': 'D', 'val1': 3, 'val2': 6}]

D[{'key': 'A', 'val1': 1, 'val2': null}, {'key': 'B', 'val1': 2, 'val2': 4}, {'key': 'C', 'val1': 3, 'val2': 5}, {'key': 'D', 'val1': null, 'val2': 6}]

Attempts:

2 left

🧠 Conceptual

intermediate

1:30remaining

Which statement best describes an inner join?

Choose the correct description of an inner join operation between two tables or dataframes.

AReturns only rows where keys exist in both tables, excluding all others.

BReturns all rows from the left table and matching rows from the right table, filling with nulls if no match.

CReturns all rows from the right table and matching rows from the left table, filling with nulls if no match.

DReturns all rows from both tables, combining matching rows and filling nulls where no match exists.

Attempts:

2 left

📝 Syntax

advanced

1:30remaining

Identify the correct syntax for an inner join in pandas

Which of the following code snippets correctly performs an inner join on column id between df_left and df_right?

Adf_left.merge(df_right, on='id', join='inner')

Bdf_left.join(df_right, on='id', how='inner')

Cpd.join(df_left, df_right, on='id', how='inner')

Dpd.merge(df_left, df_right, on='id', how='inner')

Attempts:

2 left

❓ optimization

advanced

2:00remaining

Optimizing inner join performance on large dataframes

You have two large dataframes with millions of rows each. Which approach can improve the performance of an inner join on column user_id?

AEnsure both dataframes have <code>user_id</code> as an index before merging.

BConvert <code>user_id</code> columns to string type before merging.

CDrop duplicates from both dataframes before merging regardless of keys.

DUse <code>how='outer'</code> instead of <code>how='inner'</code> to reduce computation.

Attempts:

2 left

🔧 Debug

expert

2:30remaining

Why does this inner join produce an empty dataframe?

Given the code below, why is the result of the inner join empty?

Pandas

import pandas as pd

df1 = pd.DataFrame({'key': ['a', 'b', 'c'], 'val': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['A', 'B', 'C'], 'val': [4, 5, 6]})

result = pd.merge(df1, df2, on='key', how='inner')
print(result)

AThe merge function requires specifying suffixes for overlapping columns.

BThe 'val' columns have different names causing the join to fail.

CThe keys differ in case sensitivity; 'a' != 'A', so no matches occur.

DThe 'how' parameter should be 'left' instead of 'inner' to get results.

Attempts:

2 left