How to Use bind_rows in dplyr for Combining Data Frames
Use
bind_rows() from the dplyr package to combine two or more data frames by stacking their rows. It automatically matches columns by name and fills missing columns with NA. This function is handy for merging datasets with similar structures.Syntax
The basic syntax of bind_rows() is simple:
bind_rows(..., .id = NULL)
Here, ... represents two or more data frames or tibbles to combine by rows.
The optional .id argument adds a new column with the source name of each row.
r
bind_rows(..., .id = NULL)
Example
This example shows how to combine two data frames with some different columns using bind_rows(). Missing columns are filled with NA.
r
library(dplyr) # Create two data frames df1 <- data.frame(id = 1:3, name = c("Alice", "Bob", "Carol")) df2 <- data.frame(id = 4:5, age = c(30, 25)) # Combine rows combined <- bind_rows(df1, df2) print(combined)
Output
id name age
1 1 Alice NA
2 2 Bob NA
3 3 Carol NA
4 4 <NA> 30
5 5 <NA> 25
Common Pitfalls
Common mistakes when using bind_rows() include:
- Trying to combine objects that are not data frames or tibbles.
- Expecting columns to be matched by position instead of by name.
- Not handling missing columns, which
bind_rows()fills withNAautomatically.
Here is an example showing a wrong and right way:
r
library(dplyr) # Wrong: combining a data frame and a vector # bind_rows(data.frame(a = 1:2), c(3,4)) # This will error # Right: combine only data frames bind_rows(data.frame(a = 1:2), data.frame(a = 3:4))
Output
a
1 1
2 2
3 3
4 4
Quick Reference
| Argument | Description |
|---|---|
| ... | Two or more data frames or tibbles to combine by rows |
| .id | Optional string to create a new column identifying the source data frame |
Key Takeaways
Use bind_rows() to stack data frames by rows, matching columns by name.
bind_rows() fills missing columns with NA automatically.
Only combine data frames or tibbles; other objects cause errors.
Use the .id argument to track the origin of each row if needed.