How to Use across in dplyr for Multiple Column Operations
In
dplyr, use across() inside verbs like mutate() or summarise() to apply a function to multiple columns at once. It lets you select columns and apply one or more functions cleanly without repeating code.Syntax
The basic syntax of across() is:
across(.cols, .fns, ...)
Where:
.colsselects columns to operate on (e.g.,starts_with("x"),c(col1, col2))..fnsis the function or list of functions to apply (e.g.,mean,~ .x + 1).- Additional arguments can be passed to the function.
You typically use across() inside mutate(), summarise(), or filter().
r
mutate(data, across(.cols, .fns, ...))
Example
This example shows how to add 1 to all numeric columns in a data frame using mutate() and across().
r
library(dplyr) data <- tibble( a = 1:3, b = 4:6, c = letters[1:3] ) result <- data %>% mutate(across(where(is.numeric), ~ .x + 1)) print(result)
Output
# A tibble: 3 ร 3
a b c
<int> <int> <chr>
1 2 5 a
2 3 6 b
3 4 7 c
Common Pitfalls
Common mistakes when using across() include:
- Not using
across()inside a dplyr verb likemutate()orsummarise(). - Forgetting to select columns properly, which can cause errors or unexpected results.
- Using functions that do not work element-wise or expecting
across()to work outside tidyverse verbs.
Example of wrong and right usage:
r
library(dplyr) data <- tibble(x = 1:3, y = 4:6) # Wrong: using across() alone # across(where(is.numeric), mean) # Error: must be inside mutate or summarise # Right: inside summarise result <- data %>% summarise(across(where(is.numeric), mean)) print(result)
Output
# A tibble: 1 ร 2
x y
<dbl> <dbl>
1 2 5
Quick Reference
| Argument | Description | Example |
|---|---|---|
| .cols | Select columns to apply function | where(is.numeric), starts_with('a') |
| .fns | Function(s) to apply | mean, ~ .x + 1 |
| Additional args | Extra parameters for functions | na.rm = TRUE |
| Usage | Used inside dplyr verbs | mutate(across(...)), summarise(across(...)) |
Key Takeaways
Use across() inside dplyr verbs like mutate() or summarise() to apply functions to multiple columns.
Select columns clearly with helpers like where(), starts_with(), or column names.
You can apply one or multiple functions with across() cleanly and efficiently.
Avoid using across() outside of dplyr verbs to prevent errors.
across() helps write concise and readable code for column-wise operations.