Data Analysis Pythondata~10 mins

Merging on multiple keys in Data Analysis Python - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Merging on multiple keys

Start with two DataFrames

↓

Identify multiple keys to join on

↓

Match rows where all keys are equal

↓

Combine matched rows into one

↓

Result: merged DataFrame with combined info

↓

End

We take two tables, find rows where all key columns match, and join their data into one combined table.

Execution Sample

Data Analysis Python

import pandas as pd

df1 = pd.DataFrame({'City': ['NY', 'LA', 'NY'], 'Year': [2020, 2020, 2021], 'Pop': [8, 4, 8.3]})
df2 = pd.DataFrame({'City': ['NY', 'LA', 'NY'], 'Year': [2020, 2020, 2022], 'GDP': [1500, 1000, 1600]})

merged = pd.merge(df1, df2, on=['City', 'Year'])
print(merged)

This code merges two tables on both 'City' and 'Year' columns, combining rows where both match.

Execution Table

Step	Action	df1 Row	df2 Row	Keys Matched?	Merged Row
1	Check df1 row 0 and df2 row 0	{City: NY, Year: 2020, Pop: 8}	{City: NY, Year: 2020, GDP: 1500}	Yes	{City: NY, Year: 2020, Pop: 8, GDP: 1500}
2	Check df1 row 0 and df2 row 1	{City: NY, Year: 2020, Pop: 8}	{City: LA, Year: 2020, GDP: 1000}	No	No merge
3	Check df1 row 0 and df2 row 2	{City: NY, Year: 2020, Pop: 8}	{City: NY, Year: 2022, GDP: 1600}	No	No merge
4	Check df1 row 1 and df2 row 0	{City: LA, Year: 2020, Pop: 4}	{City: NY, Year: 2020, GDP: 1500}	No	No merge
5	Check df1 row 1 and df2 row 1	{City: LA, Year: 2020, Pop: 4}	{City: LA, Year: 2020, GDP: 1000}	Yes	{City: LA, Year: 2020, Pop: 4, GDP: 1000}
6	Check df1 row 1 and df2 row 2	{City: LA, Year: 2020, Pop: 4}	{City: NY, Year: 2022, GDP: 1600}	No	No merge
7	Check df1 row 2 and df2 row 0	{City: NY, Year: 2021, Pop: 8.3}	{City: NY, Year: 2020, GDP: 1500}	No	No merge
8	Check df1 row 2 and df2 row 1	{City: NY, Year: 2021, Pop: 8.3}	{City: LA, Year: 2020, GDP: 1000}	No	No merge
9	Check df1 row 2 and df2 row 2	{City: NY, Year: 2021, Pop: 8.3}	{City: NY, Year: 2022, GDP: 1600}	No	No merge
10	Merge complete	-	-	-	2 rows merged

💡 No more rows to check; merge finished with 2 matching rows on both keys.

Variable Tracker

Variable	Start	After Step 1	After Step 5	Final
merged_rows	[]	[{City: NY, Year: 2020, Pop: 8, GDP: 1500}]	[{City: NY, Year: 2020, Pop: 8, GDP: 1500}, {City: LA, Year: 2020, Pop: 4, GDP: 1000}]	[{City: NY, Year: 2020, Pop: 8, GDP: 1500}, {City: LA, Year: 2020, Pop: 4, GDP: 1000}]

Key Moments - 2 Insights

Why do some rows from df1 not appear in the merged result?

What happens if only one key matches but not the other?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table, what is the merged row at step 5?

ANo merge

B{City: NY, Year: 2020, Pop: 8, GDP: 1500}

C{City: LA, Year: 2020, Pop: 4, GDP: 1000}

D{City: NY, Year: 2022, Pop: 8.3, GDP: 1600}

Concept Snapshot

pd.merge(df1, df2, on=[key1, key2])
- Joins two DataFrames on multiple columns
- Only rows with all keys matching are merged
- Result contains combined columns from both
- Useful for detailed matching like city and year
- Non-matching rows are excluded by default

Full Transcript

Merging on multiple keys means combining two tables by matching rows where all specified columns are equal. We start with two DataFrames, identify the keys to join on, then check each row pair to see if all keys match. If they do, we combine their data into one row in the result. Rows without matching keys are left out. This process helps us combine related data from different sources precisely. The example code merges on 'City' and 'Year', producing a table with population and GDP where both match. The execution table shows step-by-step how each row pair is checked and merged or skipped. Variables track the growing list of merged rows. Key moments clarify why some rows don't merge and the importance of all keys matching. The quiz tests understanding of these steps and outcomes.