How to Use mutate in dplyr: Add or Change Columns Easily
In
dplyr, use mutate() to add new columns or change existing ones in a data frame by specifying column names and their new values. It keeps all original columns and returns a modified data frame with your changes.Syntax
The basic syntax of mutate() is:
data %>% mutate(new_column = expression)datais your data frame or tibble.new_columnis the name of the column you want to add or modify.expressionis how you calculate the new column's values, which can use existing columns.
r
mutate(data, new_column = expression, another_column = expression2, ...)
Example
This example shows how to add a new column total by summing two existing columns x and y in a data frame.
r
library(dplyr) data <- tibble(x = 1:5, y = 6:10) result <- data %>% mutate(total = x + y) print(result)
Output
# A tibble: 5 × 3
x y total
<int> <int> <int>
1 1 6 7
2 2 7 9
3 3 8 11
4 4 9 13
5 5 10 15
Common Pitfalls
Common mistakes when using mutate() include:
- Trying to use columns that don't exist yet in the same
mutate()call (you can use newly created columns later in the same call, but order matters). - Overwriting important columns unintentionally without realizing.
- Forgetting to load
dplyror use the pipe operator%>%.
r
library(dplyr) data <- tibble(a = 1:3) # Wrong: using new column before it's created # data %>% mutate(b = a + 1, c = b + 1) # Error: object 'b' not found # Right: create b first, then use it result <- data %>% mutate(b = a + 1) %>% mutate(c = b + 1) print(result)
Output
# A tibble: 3 × 3
a b c
<int> <int> <int>
1 1 2 3
2 2 3 4
3 3 4 5
Quick Reference
| Function | Description | Example |
|---|---|---|
| mutate() | Add or modify columns | data %>% mutate(new_col = old_col * 2) |
| transmute() | Create only new columns, drop others | data %>% transmute(new_col = old_col + 1) |
| mutate(across()) | Apply function to multiple columns | data %>% mutate(across(c(col1, col2), ~ . * 2)) |
Key Takeaways
Use mutate() to add or change columns in a data frame while keeping existing data.
You can create multiple new columns in one mutate() call, but order matters when referencing new columns.
Always load dplyr and use the pipe operator %>% for clear and readable code.
mutate() returns a new data frame; it does not change the original unless reassigned.
Use transmute() if you want to keep only the new columns created.