0
0
Data Analysis Pythondata~5 mins

merge() for SQL-style joins in Data Analysis Python

Choose your learning style9 modes available
Introduction

We use merge() to combine two tables based on common columns, just like joining tables in a database. This helps us see related information together.

You have two lists of customer data and want to see all details in one place.
You want to combine sales data with product details using product IDs.
You need to find matching records between two datasets, like students and their grades.
You want to keep all records from one table and add matching info from another.
You want to find records that appear in both tables or only in one.
Syntax
Data Analysis Python
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None)

left and right are the two tables (DataFrames) to join.

how decides the type of join: 'inner', 'left', 'right', or 'outer'.

Examples
Join two tables on the column 'id' keeping only matching rows (inner join).
Data Analysis Python
pd.merge(df1, df2, on='id')
Keep all rows from df1 and add matching rows from df2.
Data Analysis Python
pd.merge(df1, df2, how='left', on='id')
Keep all rows from both tables, filling missing matches with NaN.
Data Analysis Python
pd.merge(df1, df2, how='outer', on='id')
Join on different column names from each table.
Data Analysis Python
pd.merge(df1, df2, left_on='key1', right_on='key2')
Sample Program

This code shows three types of joins between customers and their orders using merge(). It helps see which customers have orders and which don't.

Data Analysis Python
import pandas as pd

# Create first table with customer info
df_customers = pd.DataFrame({
    'customer_id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', 'Charlie', 'David']
})

# Create second table with orders info
df_orders = pd.DataFrame({
    'order_id': [101, 102, 103, 104],
    'customer_id': [2, 2, 3, 5],
    'product': ['Book', 'Pen', 'Notebook', 'Pencil']
})

# Inner join: only customers with orders
inner_join = pd.merge(df_customers, df_orders, on='customer_id', how='inner')
print('Inner Join Result:')
print(inner_join)

# Left join: all customers, with orders if any
left_join = pd.merge(df_customers, df_orders, on='customer_id', how='left')
print('\nLeft Join Result:')
print(left_join)

# Outer join: all customers and all orders
outer_join = pd.merge(df_customers, df_orders, on='customer_id', how='outer')
print('\nOuter Join Result:')
print(outer_join)
OutputSuccess
Important Notes

If columns have different names, use left_on and right_on to specify them.

Missing matches appear as NaN in the result.

Inner join keeps only rows with matching keys in both tables.

Summary

merge() combines tables like database joins.

Use how to choose join type: inner, left, right, or outer.

Specify columns to join on with on, or left_on and right_on.