0
0
Data Analysis Pythondata~10 mins

merge() for SQL-style joins in Data Analysis Python - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - merge() for SQL-style joins
Start with two DataFrames
Choose join key(s)
Select join type: inner, left, right, outer
Match rows based on keys
Combine matched rows
Handle unmatched rows per join type
Return merged DataFrame
The merge() function takes two tables, matches rows by keys, combines them based on join type, and returns the joined table.
Execution Sample
Data Analysis Python
import pandas as pd

left = pd.DataFrame({'key': ['A', 'B', 'C'], 'left_val': [1, 2, 3]})
right = pd.DataFrame({'key': ['B', 'C', 'D'], 'right_val': [4, 5, 6]})

result = pd.merge(left, right, on='key', how='inner')
print(result)
This code merges two DataFrames on 'key' using an inner join, keeping only matching keys.
Execution Table
StepActionLeft DataFrame RowsRight DataFrame RowsMatching KeysResult Rows
1Start with left and right DataFrames[A:1, B:2, C:3][B:4, C:5, D:6]--
2Choose join key 'key'----
3Select join type 'inner'----
4Find matching keys in both: B, C--[B, C]-
5Combine rows with keys B and C---[B:2,4; C:3,5]
6Return merged DataFrame with 2 rows---[B:2,4; C:3,5]
💡 Inner join keeps only keys present in both DataFrames (B and C).
Variable Tracker
VariableStartAfter Step 4After Step 5Final
left[A:1, B:2, C:3][A:1, B:2, C:3][A:1, B:2, C:3][A:1, B:2, C:3]
right[B:4, C:5, D:6][B:4, C:5, D:6][B:4, C:5, D:6][B:4, C:5, D:6]
matching_keysNone[B, C][B, C][B, C]
resultNoneNone[B:2,4; C:3,5][B:2,4; C:3,5]
Key Moments - 2 Insights
Why does the merged result only have keys B and C, not A or D?
Because the join type is 'inner', merge() keeps only rows with keys present in both DataFrames, which are B and C (see execution_table step 4).
What happens if we change 'how' to 'left'?
All keys from the left DataFrame are kept, and matching rows from right are added. Unmatched right rows become NaN (not shown in this trace).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 5, how many rows does the result have?
A1 row
B3 rows
C2 rows
D4 rows
💡 Hint
Check the 'Result Rows' column at step 5 in execution_table.
At which step do we identify the keys that exist in both DataFrames?
AStep 2
BStep 4
CStep 6
DStep 3
💡 Hint
Look for the step mentioning 'Find matching keys' in execution_table.
If we change 'how' to 'outer', what would happen to keys A and D?
AThey would appear with NaN for missing columns
BOnly key A would appear
CThey would be excluded
DOnly key D would appear
💡 Hint
Recall that 'outer' join keeps all keys from both DataFrames, filling missing data with NaN.
Concept Snapshot
merge(left, right, on=key, how=join_type)
- Joins two DataFrames on key(s)
- how options: inner (default), left, right, outer
- inner: keep keys in both
- left: keep all left keys
- right: keep all right keys
- outer: keep all keys from both
Full Transcript
The merge() function in pandas combines two tables by matching rows on specified keys. You pick which keys to join on and the type of join: inner, left, right, or outer. Inner join keeps only rows with keys in both tables. Left join keeps all rows from the left table and matches from the right. Right join keeps all from right and matches from left. Outer join keeps all rows from both tables, filling missing data with NaN. The example merges two small tables on the 'key' column using inner join, resulting in rows only for keys present in both tables. This step-by-step trace shows how keys are matched and rows combined.