0
0
Data Analysis Pythondata~20 mins

Outer join in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Outer Join Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
query_result
intermediate
2:00remaining
What is the output of this left outer join?

Given two dataframes df1 and df2:

df1 = pd.DataFrame({"id": [1, 2, 3], "value1": [10, 20, 30]})
df2 = pd.DataFrame({"id": [2, 3, 4], "value2": [200, 300, 400]})

result = pd.merge(df1, df2, on="id", how="left")
print(result)

What will be printed?

Data Analysis Python
import pandas as pd
df1 = pd.DataFrame({"id": [1, 2, 3], "value1": [10, 20, 30]})
df2 = pd.DataFrame({"id": [2, 3, 4], "value2": [200, 300, 400]})
result = pd.merge(df1, df2, on="id", how="left")
print(result)
A
   id  value1  value2
0   1      10   100.0
1   2      20   200.0
2   3      30   300.0
B
   id  value1  value2
0   1      10     NaN
1   2      20   200.0
2   3      30   300.0
C
   id  value1  value2
0   1      10   200.0
1   2      20   300.0
2   3      30   400.0
D
   id  value1  value2
0   2      20   200.0
1   3      30   300.0
2   4     NaN   400.0
Attempts:
2 left
💡 Hint

Remember, a left join keeps all rows from the left dataframe and matches rows from the right dataframe where possible.

query_result
intermediate
2:00remaining
What is the output of this full outer join?

Given two dataframes df1 and df2:

df1 = pd.DataFrame({"key": ["A", "B", "C"], "val1": [1, 2, 3]})
df2 = pd.DataFrame({"key": ["B", "C", "D"], "val2": [20, 30, 40]})

result = pd.merge(df1, df2, on="key", how="outer")
print(result)

What will be printed?

Data Analysis Python
import pandas as pd
df1 = pd.DataFrame({"key": ["A", "B", "C"], "val1": [1, 2, 3]})
df2 = pd.DataFrame({"key": ["B", "C", "D"], "val2": [20, 30, 40]})
result = pd.merge(df1, df2, on="key", how="outer")
print(result)
A
  key  val1  val2
0   A   1.0   NaN
1   B   2.0  20.0
2   C   3.0  30.0
3   D   NaN  40.0
B
  key  val1  val2
0   A   1.0  20.0
1   B   2.0  30.0
2   C   3.0  40.0
C
  key  val1  val2
0   B   2.0  20.0
1   C   3.0  30.0
2   D   NaN  40.0
D
  key  val1  val2
0   A   NaN   NaN
1   B   2.0  20.0
2   C   3.0  30.0
3   D   40.0  40.0
Attempts:
2 left
💡 Hint

Full outer join keeps all rows from both dataframes, filling missing values with NaN.

📝 Syntax
advanced
1:30remaining
Which option produces a syntax error in this outer join code?

Consider the following code snippet to perform an outer join in pandas:

pd.merge(df1, df2, on='id', how='outer')

Which of the following options will cause a syntax error?

Data Analysis Python
import pandas as pd
pd.merge(df1, df2, on='id', how='outer')
Apd.merge(df1, df2, on='id', how='outer')
B)'retuo'=woh ,'di'=no ,2fd ,1fd(egrem.dp
Cpd.merge(df1, df2, on='id' how='outer')
Dd.merge(df1, df2, on='id', how='outer')
Attempts:
2 left
💡 Hint

Check for missing commas between arguments.

optimization
advanced
2:30remaining
Which option is the most efficient way to perform a full outer join on large dataframes?

You have two large dataframes df1 and df2 with millions of rows. You want to perform a full outer join on column key. Which option is the most efficient?

Apd.merge(df1, df2, on='key', how='inner', sort=False)
Bpd.merge(df1, df2, on='key', how='outer', sort=True)
Cpd.merge(df1, df2, on='key', how='outer').sort_values('key')
Dpd.merge(df1, df2, on='key', how='outer', sort=False)
Attempts:
2 left
💡 Hint

Sorting during merge can slow down performance on large data.

🧠 Conceptual
expert
2:00remaining
What is the number of rows in the result of a full outer join?

Given two tables T1 and T2 with unique keys, the number of rows in a full outer join on the key column is:

AThe sum of the number of rows in <code>T1</code> and <code>T2</code> minus the number of keys common to both
BThe number of rows in <code>T1</code> only
CThe number of rows in <code>T2</code> only
DThe product of the number of rows in <code>T1</code> and <code>T2</code>
Attempts:
2 left
💡 Hint

Think about how full outer join combines all unique keys from both tables.