0
0
Pandasdata~10 mins

merge() for SQL-like joins in Pandas - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - merge() for SQL-like joins
Start with two DataFrames
Choose join keys
Select join type: inner, left, right, outer
Match rows based on keys
Combine matched rows
Result: merged DataFrame
The merge() function takes two tables, matches rows by keys, and combines them based on the join type.
Execution Sample
Pandas
import pandas as pd

left = pd.DataFrame({'key': ['A', 'B', 'C'], 'left_val': [1, 2, 3]})
right = pd.DataFrame({'key': ['B', 'C', 'D'], 'right_val': [4, 5, 6]})

result = pd.merge(left, right, on='key', how='inner')
print(result)
This code merges two DataFrames on the 'key' column using an inner join, keeping only matching keys.
Execution Table
StepActionLeft DataFrame RowsRight DataFrame RowsMatched KeysResult Rows
1Start with left DataFrame[('A',1), ('B',2), ('C',3)]N/AN/AN/A
2Start with right DataFrameN/A[('B',4), ('C',5), ('D',6)]N/AN/A
3Choose join keys 'key'Keys: A,B,CKeys: B,C,DN/AN/A
4Select join type 'inner'N/AN/AN/AN/A
5Match rows on keysB, CB, CB, CN/A
6Combine matched rows('B',2), ('C',3)('B',4), ('C',5)B, C[('B',2,4), ('C',3,5)]
7Output merged DataFrameN/AN/AN/A[('B',2,4), ('C',3,5)]
💡 All matching keys B and C combined; keys A and D excluded in inner join
Variable Tracker
VariableStartAfter Step 5After Step 6Final
left[('A',1), ('B',2), ('C',3)][('A',1), ('B',2), ('C',3)][('A',1), ('B',2), ('C',3)][('A',1), ('B',2), ('C',3)]
right[('B',4), ('C',5), ('D',6)][('B',4), ('C',5), ('D',6)][('B',4), ('C',5), ('D',6)][('B',4), ('C',5), ('D',6)]
matched_keysNone['B', 'C']['B', 'C']['B', 'C']
resultNoneNone[('B',2,4), ('C',3,5)][('B',2,4), ('C',3,5)]
Key Moments - 2 Insights
Why does the merged result only include keys 'B' and 'C' and not 'A' or 'D'?
Because the join type is 'inner', only keys present in both DataFrames are included. Rows with keys 'A' (only in left) and 'D' (only in right) are excluded, as shown in execution_table rows 5 and 6.
What happens if we change the join type to 'left'?
All keys from the left DataFrame are kept, including 'A'. Matching keys from right are merged, and unmatched right values become NaN. This would change the matched keys and result rows in execution_table.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table at step 5, which keys are matched between left and right DataFrames?
AA and D
BB and C
CA, B, C
DC and D
💡 Hint
Check the 'Matched Keys' column at step 5 in the execution_table.
At which step does the merged DataFrame get its final rows?
AStep 4
BStep 5
CStep 7
DStep 6
💡 Hint
Look at the 'Result Rows' column in execution_table; final output is shown at step 7.
If the join type changes to 'outer', what happens to keys 'A' and 'D' in the result?
ABoth 'A' and 'D' are included with NaN for missing values
BOnly 'A' is included
CThey are excluded
DOnly 'D' is included
💡 Hint
Recall that 'outer' join keeps all keys from both DataFrames, filling missing matches with NaN.
Concept Snapshot
pandas.merge(left, right, on='key', how='type')
- Combines two DataFrames by matching keys
- 'how' controls join: inner, left, right, outer
- Inner join keeps only matching keys
- Left join keeps all left keys
- Outer join keeps all keys from both
- Result is a new DataFrame with combined columns
Full Transcript
The merge() function in pandas combines two tables by matching rows on specified keys. You start with two DataFrames, choose which columns to join on, and pick a join type like inner or left. The function finds matching keys and combines those rows. For example, an inner join keeps only keys present in both tables. The result is a new DataFrame with columns from both inputs. This process helps you combine related data easily, like matching customer info with orders.