0
0
Data Analysis Pythondata~10 mins

Inner join in Data Analysis Python - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Inner join
Start with Table A
Find matching keys in Table B
Combine rows where keys match
Output rows with matched data only
End
Inner join takes two tables and combines rows where the key values match in both tables, leaving out rows without matches.
Execution Sample
Data Analysis Python
import pandas as pd

df1 = pd.DataFrame({'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'id': [2, 3, 4], 'age': [25, 30, 40]})

result = pd.merge(df1, df2, on='id', how='inner')
print(result)
This code merges two dataframes on the 'id' column, keeping only rows with matching 'id's in both.
Execution Table
StepActiondf1 Rows Considereddf2 Rows ConsideredMatching Keys FoundResult Rows Added
1Start with df1 and df2[{1, Alice}, {2, Bob}, {3, Charlie}][{2, 25}, {3, 30}, {4, 40}]N/AN/A
2Check id=1 in df1 against df2{1, Alice}[{2, 25}, {3, 30}, {4, 40}]No matchNo row added
3Check id=2 in df1 against df2{2, Bob}[{2, 25}, {3, 30}, {4, 40}]Match foundAdd {2, Bob, 25}
4Check id=3 in df1 against df2{3, Charlie}[{2, 25}, {3, 30}, {4, 40}]Match foundAdd {3, Charlie, 30}
5No more rows in df1N/AN/AN/AEnd with 2 rows in result
💡 All rows in df1 checked; only rows with matching ids in df2 included in result.
Variable Tracker
VariableStartAfter Step 2After Step 3After Step 4Final
resultemptyempty[{2, Bob, 25}][{2, Bob, 25}, {3, Charlie, 30}][{2, Bob, 25}, {3, Charlie, 30}]
Key Moments - 2 Insights
Why is the row with id=1 from df1 not in the result?
Because in execution_table row 2, id=1 has no matching id in df2, so it is excluded from the inner join result.
Does the row with id=4 from df2 appear in the result?
No, because inner join only keeps rows with matching keys in both tables; id=4 is not in df1, so it is excluded (see execution_table rows 3-5).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what rows are added to the result after Step 3?
A[{2, Bob, 25}]
B[{1, Alice, 25}]
C[{3, Charlie, 30}]
DNo rows added
💡 Hint
Check the 'Result Rows Added' column at Step 3 in execution_table.
At which step does the condition for id=1 fail to find a match?
AStep 1
BStep 3
CStep 2
DStep 4
💡 Hint
Look at the 'Matching Keys Found' column for id=1 in execution_table.
If df2 had an extra row with id=3 and age=35, how would the result change?
AResult would be unchanged
BResult would have duplicate rows for id=3
CResult would exclude id=3 rows
DResult would include id=4 row
💡 Hint
Consider how inner join matches multiple rows with the same key in variable_tracker.
Concept Snapshot
Inner join combines two tables by matching rows on a key.
Only rows with keys present in both tables appear in the result.
Syntax in pandas: pd.merge(df1, df2, on='key', how='inner')
Useful to find common data between tables.
Rows without matches are excluded.
Full Transcript
Inner join is a way to combine two tables by matching rows that share the same key value. We start with two tables, look for keys that appear in both, and then combine those rows into one result table. Rows that don't have a matching key in the other table are left out. For example, if we have a table of people with ids and names, and another table with ids and ages, an inner join on the id will give us only the people who have both a name and an age recorded. The code example uses pandas merge with how='inner' to do this. Step by step, we check each id in the first table against the second, add matching rows to the result, and skip those without matches. This helps us focus on data that is common to both tables.