0
0
Pandasdata~10 mins

Outer join behavior in Pandas - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Outer join behavior
Start with two tables
Match rows on key
Keep all rows from both tables
Fill missing values with NaN
Result: Combined table with all rows
Outer join combines two tables by matching keys and keeps all rows from both tables, filling missing matches with NaN.
Execution Sample
Pandas
import pandas as pd

df1 = pd.DataFrame({'key': ['A', 'B'], 'val1': [1, 2]})
df2 = pd.DataFrame({'key': ['B', 'C'], 'val2': [3, 4]})

result = pd.merge(df1, df2, on='key', how='outer')
This code merges two dataframes on 'key' using an outer join, keeping all keys from both.
Execution Table
StepActiondf1 rowsdf2 rowsMatched keysResult rowsNotes
1Start with df1 and df2A, BB, CNone yetNone yetInitial tables
2Match keys between df1 and df2A, BB, CBNone yetOnly 'B' matches
3Combine matched rowsB rowB rowB1Row with key 'B' combined
4Add unmatched rows from df1A row--1 + 1 = 2'A' from df1 added with NaN for df2 columns
5Add unmatched rows from df2-C row-2 + 1 = 3'C' from df2 added with NaN for df1 columns
6Final resultA, BB, CB3All keys from both tables included
💡 All rows from both tables included; unmatched keys filled with NaN
Variable Tracker
VariableStartAfter Step 3After Step 4After Step 5Final
resultempty1 row (key B combined)2 rows (added key A with NaN)3 rows (added key C with NaN)3 rows (keys A, B, C)
Key Moments - 2 Insights
Why does the result have NaN values?
Because outer join keeps all rows from both tables, rows without a matching key in the other table get NaN for missing columns, as shown in steps 4 and 5.
Why is the key 'B' row combined but 'A' and 'C' are not?
Only 'B' exists in both tables, so it combines. 'A' and 'C' exist in only one table each, so they appear with NaN for the other table's columns (see step 2 and 3).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, how many rows does the result have after step 5?
A2
B1
C3
D4
💡 Hint
Check the 'Result rows' column at step 5 in the execution_table.
At which step are unmatched rows from df1 added to the result?
AStep 4
BStep 3
CStep 2
DStep 5
💡 Hint
Look for the step mentioning adding unmatched rows from df1 in the execution_table.
If df2 had an extra key 'D', what would happen to the number of rows in the final result?
AIt would stay the same
BIt would increase by 1
CIt would decrease by 1
DIt would double
💡 Hint
Outer join keeps all keys from both tables, so adding a new key in df2 adds a new row (see variable_tracker).
Concept Snapshot
Outer join in pandas merges two tables keeping all rows from both.
Syntax: pd.merge(df1, df2, on='key', how='outer')
Matched keys combine rows; unmatched keys fill missing columns with NaN.
Result includes all keys from both tables.
Useful to keep full data when keys don't fully overlap.
Full Transcript
Outer join behavior in pandas means combining two tables by matching keys and keeping all rows from both tables. When keys match, rows merge. When keys don't match, rows from each table still appear with missing values filled as NaN. The example code merges two dataframes on 'key' using an outer join. The execution table shows step-by-step how matched rows combine and unmatched rows add with NaN. Variable tracking shows the result grows from empty to three rows including all keys. Key moments clarify why NaN appears and why only matching keys combine. The visual quiz tests understanding of row counts and steps where unmatched rows add. The snapshot summarizes syntax and behavior for quick reference.