Challenge - 5 Problems

🎖️

Outer Join Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ query_result

intermediate

2:00remaining

What is the output of this left outer join?

Given two dataframes df1 and df2:

df1 = pd.DataFrame({"id": [1, 2, 3], "value1": [10, 20, 30]})
df2 = pd.DataFrame({"id": [2, 3, 4], "value2": [200, 300, 400]})

result = pd.merge(df1, df2, on="id", how="left")
print(result)

What will be printed?

Data Analysis Python

import pandas as pd
df1 = pd.DataFrame({"id": [1, 2, 3], "value1": [10, 20, 30]})
df2 = pd.DataFrame({"id": [2, 3, 4], "value2": [200, 300, 400]})
result = pd.merge(df1, df2, on="id", how="left")
print(result)

   id  value1  value2
0   1      10   100.0
1   2      20   200.0
2   3      30   300.0

   id  value1  value2
0   1      10     NaN
1   2      20   200.0
2   3      30   300.0

   id  value1  value2
0   1      10   200.0
1   2      20   300.0
2   3      30   400.0

   id  value1  value2
0   2      20   200.0
1   3      30   300.0
2   4     NaN   400.0

Attempts:

2 left

❓ query_result

intermediate

2:00remaining

What is the output of this full outer join?

Given two dataframes df1 and df2:

df1 = pd.DataFrame({"key": ["A", "B", "C"], "val1": [1, 2, 3]})
df2 = pd.DataFrame({"key": ["B", "C", "D"], "val2": [20, 30, 40]})

result = pd.merge(df1, df2, on="key", how="outer")
print(result)

What will be printed?

Data Analysis Python

import pandas as pd
df1 = pd.DataFrame({"key": ["A", "B", "C"], "val1": [1, 2, 3]})
df2 = pd.DataFrame({"key": ["B", "C", "D"], "val2": [20, 30, 40]})
result = pd.merge(df1, df2, on="key", how="outer")
print(result)

  key  val1  val2
0   A   1.0   NaN
1   B   2.0  20.0
2   C   3.0  30.0
3   D   NaN  40.0

  key  val1  val2
0   A   1.0  20.0
1   B   2.0  30.0
2   C   3.0  40.0

  key  val1  val2
0   B   2.0  20.0
1   C   3.0  30.0
2   D   NaN  40.0

  key  val1  val2
0   A   NaN   NaN
1   B   2.0  20.0
2   C   3.0  30.0
3   D   40.0  40.0

Attempts:

2 left

📝 Syntax

advanced

1:30remaining

Which option produces a syntax error in this outer join code?

Consider the following code snippet to perform an outer join in pandas:

pd.merge(df1, df2, on='id', how='outer')

Which of the following options will cause a syntax error?

Data Analysis Python

import pandas as pd
pd.merge(df1, df2, on='id', how='outer')

Apd.merge(df1, df2, on='id', how='outer')

B)'retuo'=woh ,'di'=no ,2fd ,1fd(egrem.dp

Cpd.merge(df1, df2, on='id' how='outer')

Dd.merge(df1, df2, on='id', how='outer')

Attempts:

2 left

❓ optimization

advanced

2:30remaining

Which option is the most efficient way to perform a full outer join on large dataframes?

You have two large dataframes df1 and df2 with millions of rows. You want to perform a full outer join on column key. Which option is the most efficient?

Apd.merge(df1, df2, on='key', how='inner', sort=False)

Bpd.merge(df1, df2, on='key', how='outer', sort=True)

Cpd.merge(df1, df2, on='key', how='outer').sort_values('key')

Dpd.merge(df1, df2, on='key', how='outer', sort=False)

Attempts:

2 left

🧠 Conceptual

expert

2:00remaining

What is the number of rows in the result of a full outer join?

Given two tables T1 and T2 with unique keys, the number of rows in a full outer join on the key column is:

AThe sum of the number of rows in <code>T1</code> and <code>T2</code> minus the number of keys common to both

BThe number of rows in <code>T1</code> only

CThe number of rows in <code>T2</code> only

DThe product of the number of rows in <code>T1</code> and <code>T2</code>

Attempts:

2 left