0
0
R-programmingHow-ToBeginner · 3 min read

How to Use full_join in dplyr for Combining Data Frames

Use full_join() from the dplyr package to merge two data frames by one or more keys, keeping all rows from both tables. Rows without matches will have NA in the unmatched columns. Specify the joining columns with the by argument.
📐

Syntax

The basic syntax of full_join() is:

  • full_join(x, y, by = NULL, ...)

Where:

  • x and y are the two data frames to join.
  • by specifies the column(s) to join on. If NULL, it uses columns with the same names in both.
  • Additional arguments can control suffixes for overlapping column names.
r
full_join(x, y, by = NULL, suffix = c(".x", ".y"), ...)
💻

Example

This example shows how to join two data frames by a common column id. The result keeps all rows from both data frames, filling missing values with NA.

r
library(dplyr)

# Create first data frame
df1 <- data.frame(id = c(1, 2, 3), value1 = c("A", "B", "C"))

# Create second data frame
df2 <- data.frame(id = c(2, 3, 4), value2 = c("X", "Y", "Z"))

# Perform full join by 'id'
result <- full_join(df1, df2, by = "id")

print(result)
Output
id value1 value2 1 1 A <NA> 2 2 B X 3 3 C Y 4 4 <NA> Z
⚠️

Common Pitfalls

Common mistakes when using full_join() include:

  • Not specifying the by argument when the key columns have different names in each data frame.
  • Unexpected duplicate columns if the join keys are not unique.
  • Confusing full_join() with inner_join() or left_join(), which keep fewer rows.

Example of specifying different key names:

r
df1 <- data.frame(key1 = c(1, 2), val1 = c("A", "B"))
df2 <- data.frame(key2 = c(2, 3), val2 = c("X", "Y"))

# Wrong: no 'by' specified, will not join correctly
wrong_join <- full_join(df1, df2)

# Right: specify keys with named vector
right_join <- full_join(df1, df2, by = c("key1" = "key2"))

print(wrong_join)
print(right_join)
Output
key1 val1 key2 val2 1 1 A NA <NA> 2 2 B NA <NA> 3 NA <NA> 2 X 4 NA <NA> 3 Y key1 val1 val2 1 1 A <NA> 2 2 B X 3 NA <NA> Y
📊

Quick Reference

ArgumentDescription
xFirst data frame to join
ySecond data frame to join
byColumn name(s) to join on; can be NULL or named vector for different names
suffixSuffixes added to overlapping non-key columns (default: .x, .y)
...Additional arguments passed to methods

Key Takeaways

Use full_join() to keep all rows from both data frames, matching by key columns.
Always specify the by argument when join keys have different names in each data frame.
Rows without matches get NA in the columns from the other data frame.
full_join() differs from inner_join() and left_join() by including unmatched rows from both sides.
Check for duplicate keys to avoid unexpected row duplication in the result.