Merging on multiple keys helps you combine data from two tables using more than one matching column. This gives more accurate matches when one column alone is not enough.
0
0
Merging on multiple keys in Data Analysis Python
Introduction
You have sales data split by store and date in two tables and want to combine them.
You want to join customer info and orders using both customer ID and region.
You need to merge employee records from two systems using employee ID and department.
You want to combine product details and inventory using product ID and warehouse location.
Syntax
Data Analysis Python
merged_df = df1.merge(df2, on=['key1', 'key2'], how='inner')
The on parameter takes a list of column names to match on.
The how parameter controls the type of merge: 'inner', 'left', 'right', or 'outer'.
Examples
Merge two dataframes on columns 'city' and 'date' using inner join by default.
Data Analysis Python
merged = df1.merge(df2, on=['city', 'date'])
Keep all rows from df1 and add matching rows from df2 based on 'user_id' and 'product_id'.
Data Analysis Python
merged = df1.merge(df2, on=['user_id', 'product_id'], how='left')
Combine all rows from both dataframes, matching on 'employee' and 'department'.
Data Analysis Python
merged = df1.merge(df2, on=['employee', 'department'], how='outer')
Sample Program
This code merges two sales dataframes on both 'store' and 'date' columns. It keeps only rows where both store and date match in both tables.
Data Analysis Python
import pandas as pd # Create first dataframe sales1 = pd.DataFrame({ 'store': ['A', 'A', 'B', 'B'], 'date': ['2024-01-01', '2024-01-02', '2024-01-01', '2024-01-02'], 'sales': [100, 150, 200, 250] }) # Create second dataframe sales2 = pd.DataFrame({ 'store': ['A', 'B', 'B', 'C'], 'date': ['2024-01-01', '2024-01-01', '2024-01-03', '2024-01-02'], 'discount': [5, 10, 15, 20] }) # Merge on both 'store' and 'date' merged_sales = sales1.merge(sales2, on=['store', 'date'], how='inner') print(merged_sales)
OutputSuccess
Important Notes
If the column names differ in the two dataframes, use left_on and right_on to specify keys.
Using multiple keys reduces wrong matches when one key alone is not unique.
Summary
Merging on multiple keys combines data using more than one column for better matching.
Use a list of column names in the on parameter.
Choose the merge type with the how parameter.