0
0
Data Analysis Pythondata~10 mins

Outer join in Data Analysis Python - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Outer join
Start with two tables
Compare keys in both tables
Match rows on keys?
Combine matching
Result: Outer join table
Outer join combines two tables by matching keys and keeps all rows from both tables, filling missing matches with NULLs.
Execution Sample
Data Analysis Python
import pandas as pd

left = pd.DataFrame({'key': [1, 2, 3], 'A': ['a1', 'a2', 'a3']})
right = pd.DataFrame({'key': [2, 3, 4], 'B': ['b2', 'b3', 'b4']})

result = pd.merge(left, right, on='key', how='outer')
print(result)
This code merges two tables on 'key' using an outer join, keeping all keys from both tables.
Execution Table
StepLeft Table RowRight Table RowKeys Match?ActionResult Row
1{key:1, A:'a1'}NoneNoInclude left row with NULLs for right{key:1, A:'a1', B:NULL}
2{key:2, A:'a2'}{key:2, B:'b2'}YesCombine both rows{key:2, A:'a2', B:'b2'}
3{key:3, A:'a3'}{key:3, B:'b3'}YesCombine both rows{key:3, A:'a3', B:'b3'}
4None{key:4, B:'b4'}NoInclude right row with NULLs for left{key:4, A:NULL, B:'b4'}
5End of rowsEnd of rowsN/AStopAll rows processed
💡 All rows from both tables processed, unmatched rows included with NULLs
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4Final
result_rows[][{key:1, A:'a1', B:NULL}][{key:1, A:'a1', B:NULL}, {key:2, A:'a2', B:'b2'}][{key:1, A:'a1', B:NULL}, {key:2, A:'a2', B:'b2'}, {key:3, A:'a3', B:'b3'}][{key:1, A:'a1', B:NULL}, {key:2, A:'a2', B:'b2'}, {key:3, A:'a3', B:'b3'}, {key:4, A:NULL, B:'b4'}][{key:1, A:'a1', B:NULL}, {key:2, A:'a2', B:'b2'}, {key:3, A:'a3', B:'b3'}, {key:4, A:NULL, B:'b4'}]
Key Moments - 2 Insights
Why do some rows have NULL values in the result?
Rows with keys that don't match in the other table are included with NULLs for missing columns, as shown in steps 1 and 4 of the execution_table.
Are all rows from both tables always included in an outer join?
Yes, the outer join keeps all rows from both tables, matching where possible and filling NULLs where no match exists, as seen in the final result after step 4.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the result row at step 2?
A{key:2, A:'a2', B:NULL}
B{key:2, A:'a2', B:'b2'}
C{key:2, A:NULL, B:'b2'}
DNo row included
💡 Hint
Check the 'Result Row' column in row with Step 2 in execution_table
At which step does the outer join include a row from the right table with no matching left key?
AStep 1
BStep 3
CStep 4
DStep 2
💡 Hint
Look for rows where Left Table Row is None in execution_table
If the left table had a row with key=4, how would the result change at step 4?
AStep 4 would combine rows instead of adding NULLs
BStep 4 would be skipped
CStep 4 would add a duplicate row
DNo change at step 4
💡 Hint
Consider how matching keys affect the 'Keys Match?' and 'Action' columns in execution_table
Concept Snapshot
Outer join merges two tables on a key.
It keeps all rows from both tables.
Matching rows combine data.
Non-matching rows fill missing parts with NULL.
Use 'how="outer"' in pandas merge for this.
Full Transcript
Outer join takes two tables and combines them by matching keys. It keeps every row from both tables. When keys match, it merges the data into one row. When keys don't match, it still keeps the row but fills missing columns with NULL values. This way, no data is lost from either table. In pandas, you do this with pd.merge(left, right, on='key', how='outer'). The execution shows step-by-step how each row is processed and included in the final result.