0
0
Data Analysis Pythondata~10 mins

Merging on multiple keys in Data Analysis Python - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Merging on multiple keys
Start with two DataFrames
Identify multiple keys to join on
Match rows where all keys are equal
Combine matched rows into one
Result: merged DataFrame with combined info
End
We take two tables, find rows where all key columns match, and join their data into one combined table.
Execution Sample
Data Analysis Python
import pandas as pd

df1 = pd.DataFrame({'City': ['NY', 'LA', 'NY'], 'Year': [2020, 2020, 2021], 'Pop': [8, 4, 8.3]})
df2 = pd.DataFrame({'City': ['NY', 'LA', 'NY'], 'Year': [2020, 2020, 2022], 'GDP': [1500, 1000, 1600]})

merged = pd.merge(df1, df2, on=['City', 'Year'])
print(merged)
This code merges two tables on both 'City' and 'Year' columns, combining rows where both match.
Execution Table
StepActiondf1 Rowdf2 RowKeys Matched?Merged Row
1Check df1 row 0 and df2 row 0{City: NY, Year: 2020, Pop: 8}{City: NY, Year: 2020, GDP: 1500}Yes{City: NY, Year: 2020, Pop: 8, GDP: 1500}
2Check df1 row 0 and df2 row 1{City: NY, Year: 2020, Pop: 8}{City: LA, Year: 2020, GDP: 1000}NoNo merge
3Check df1 row 0 and df2 row 2{City: NY, Year: 2020, Pop: 8}{City: NY, Year: 2022, GDP: 1600}NoNo merge
4Check df1 row 1 and df2 row 0{City: LA, Year: 2020, Pop: 4}{City: NY, Year: 2020, GDP: 1500}NoNo merge
5Check df1 row 1 and df2 row 1{City: LA, Year: 2020, Pop: 4}{City: LA, Year: 2020, GDP: 1000}Yes{City: LA, Year: 2020, Pop: 4, GDP: 1000}
6Check df1 row 1 and df2 row 2{City: LA, Year: 2020, Pop: 4}{City: NY, Year: 2022, GDP: 1600}NoNo merge
7Check df1 row 2 and df2 row 0{City: NY, Year: 2021, Pop: 8.3}{City: NY, Year: 2020, GDP: 1500}NoNo merge
8Check df1 row 2 and df2 row 1{City: NY, Year: 2021, Pop: 8.3}{City: LA, Year: 2020, GDP: 1000}NoNo merge
9Check df1 row 2 and df2 row 2{City: NY, Year: 2021, Pop: 8.3}{City: NY, Year: 2022, GDP: 1600}NoNo merge
10Merge complete---2 rows merged
💡 No more rows to check; merge finished with 2 matching rows on both keys.
Variable Tracker
VariableStartAfter Step 1After Step 5Final
merged_rows[][{City: NY, Year: 2020, Pop: 8, GDP: 1500}][{City: NY, Year: 2020, Pop: 8, GDP: 1500}, {City: LA, Year: 2020, Pop: 4, GDP: 1000}][{City: NY, Year: 2020, Pop: 8, GDP: 1500}, {City: LA, Year: 2020, Pop: 4, GDP: 1000}]
Key Moments - 2 Insights
Why do some rows from df1 not appear in the merged result?
Only rows where all keys ('City' and 'Year') match in both tables are merged. Rows without matching keys are excluded, as shown in execution_table steps 7-9.
What happens if only one key matches but not the other?
Rows are not merged unless all keys match. For example, in step 2, 'City' matches but 'Year' does not, so no merge occurs.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the merged row at step 5?
ANo merge
B{City: NY, Year: 2020, Pop: 8, GDP: 1500}
C{City: LA, Year: 2020, Pop: 4, GDP: 1000}
D{City: NY, Year: 2022, Pop: 8.3, GDP: 1600}
💡 Hint
Check the 'Merged Row' column at step 5 in the execution_table.
At which step does the condition 'Keys Matched?' become false for df1 row 0?
AStep 2
BStep 1
CStep 5
DStep 10
💡 Hint
Look at steps 1-3 for df1 row 0 in the execution_table under 'Keys Matched?'.
If we add a row to df2 with City='NY' and Year=2021, what would happen in the merged result?
ANo change, because keys must match exactly
BA new merged row with Pop=8.3 and GDP from new df2 row appears
CAll rows from df1 will merge with this new row
DMerge will fail with error
💡 Hint
Refer to variable_tracker and how matching keys create merged rows.
Concept Snapshot
pd.merge(df1, df2, on=[key1, key2])
- Joins two DataFrames on multiple columns
- Only rows with all keys matching are merged
- Result contains combined columns from both
- Useful for detailed matching like city and year
- Non-matching rows are excluded by default
Full Transcript
Merging on multiple keys means combining two tables by matching rows where all specified columns are equal. We start with two DataFrames, identify the keys to join on, then check each row pair to see if all keys match. If they do, we combine their data into one row in the result. Rows without matching keys are left out. This process helps us combine related data from different sources precisely. The example code merges on 'City' and 'Year', producing a table with population and GDP where both match. The execution table shows step-by-step how each row pair is checked and merged or skipped. Variables track the growing list of merged rows. Key moments clarify why some rows don't merge and the importance of all keys matching. The quiz tests understanding of these steps and outcomes.