How to Use inner_join in dplyr for Data Frame Merging
Use
inner_join() from the dplyr package to merge two data frames by matching columns, keeping only rows with keys present in both. Specify the data frames and the key columns to join on, and inner_join() returns the combined rows where keys match.Syntax
The basic syntax of inner_join() is:
inner_join(x, y, by = NULL)
where:
xandyare the two data frames to join.byspecifies the column(s) to join on. IfNULL, it uses all common column names.
r
inner_join(x, y, by = NULL)
Example
This example shows how to join two data frames by a common column id. Only rows with matching id values appear in the result.
r
library(dplyr) # Create first data frame df1 <- data.frame(id = c(1, 2, 3), name = c("Alice", "Bob", "Carol")) # Create second data frame df2 <- data.frame(id = c(2, 3, 4), score = c(88, 95, 70)) # Perform inner join on 'id' result <- inner_join(df1, df2, by = "id") print(result)
Output
id name score
1 2 Bob 88
2 3 Carol 95
Common Pitfalls
Common mistakes when using inner_join() include:
- Not specifying the
byargument when column names differ, causing unexpected results. - Joining on columns with different data types, which prevents matching.
- Assuming
inner_join()keeps all rows; it only keeps rows with keys in both data frames.
r
library(dplyr) # Wrong: columns have different names but 'by' not specified x <- data.frame(a = 1:3, val = c("x", "y", "z")) y <- data.frame(b = 2:4, val2 = c(10, 20, 30)) # This returns zero rows because no common column names wrong_join <- inner_join(x, y) # Right: specify columns to join on right_join <- inner_join(x, y, by = c("a" = "b")) print(wrong_join) print(right_join)
Output
data frame with 0 columns and 0 rows
a val b val2
1 2 y 2 10
2 3 z 3 20
Quick Reference
| Argument | Description |
|---|---|
| x | First data frame |
| y | Second data frame |
| by | Column name(s) to join on; can be a named vector for different names |
| suffix | Suffixes added to duplicate column names (default: c('.x', '.y')) |
Key Takeaways
Use inner_join() to keep only rows with matching keys in both data frames.
Always specify the 'by' argument if join columns have different names.
inner_join() merges columns from both data frames side by side.
Check that join columns have the same data type to avoid mismatches.
inner_join() is part of dplyr and requires loading the package first.