0
0
R-programmingHow-ToBeginner · 3 min read

How to Use inner_join in dplyr for Data Frame Merging

Use inner_join() from the dplyr package to merge two data frames by matching columns, keeping only rows with keys present in both. Specify the data frames and the key columns to join on, and inner_join() returns the combined rows where keys match.
📐

Syntax

The basic syntax of inner_join() is:

  • inner_join(x, y, by = NULL)

where:

  • x and y are the two data frames to join.
  • by specifies the column(s) to join on. If NULL, it uses all common column names.
r
inner_join(x, y, by = NULL)
💻

Example

This example shows how to join two data frames by a common column id. Only rows with matching id values appear in the result.

r
library(dplyr)

# Create first data frame
df1 <- data.frame(id = c(1, 2, 3), name = c("Alice", "Bob", "Carol"))

# Create second data frame
df2 <- data.frame(id = c(2, 3, 4), score = c(88, 95, 70))

# Perform inner join on 'id'
result <- inner_join(df1, df2, by = "id")

print(result)
Output
id name score 1 2 Bob 88 2 3 Carol 95
⚠️

Common Pitfalls

Common mistakes when using inner_join() include:

  • Not specifying the by argument when column names differ, causing unexpected results.
  • Joining on columns with different data types, which prevents matching.
  • Assuming inner_join() keeps all rows; it only keeps rows with keys in both data frames.
r
library(dplyr)

# Wrong: columns have different names but 'by' not specified
x <- data.frame(a = 1:3, val = c("x", "y", "z"))
y <- data.frame(b = 2:4, val2 = c(10, 20, 30))

# This returns zero rows because no common column names
wrong_join <- inner_join(x, y)

# Right: specify columns to join on
right_join <- inner_join(x, y, by = c("a" = "b"))

print(wrong_join)
print(right_join)
Output
data frame with 0 columns and 0 rows a val b val2 1 2 y 2 10 2 3 z 3 20
📊

Quick Reference

ArgumentDescription
xFirst data frame
ySecond data frame
byColumn name(s) to join on; can be a named vector for different names
suffixSuffixes added to duplicate column names (default: c('.x', '.y'))

Key Takeaways

Use inner_join() to keep only rows with matching keys in both data frames.
Always specify the 'by' argument if join columns have different names.
inner_join() merges columns from both data frames side by side.
Check that join columns have the same data type to avoid mismatches.
inner_join() is part of dplyr and requires loading the package first.