0
0
R-programmingHow-ToBeginner · 3 min read

How to Use mutate in dplyr: Add or Change Columns Easily

In dplyr, use mutate() to add new columns or change existing ones in a data frame by specifying column names and their new values. It keeps all original columns and returns a modified data frame with your changes.
📐

Syntax

The basic syntax of mutate() is:

  • data %>% mutate(new_column = expression)
  • data is your data frame or tibble.
  • new_column is the name of the column you want to add or modify.
  • expression is how you calculate the new column's values, which can use existing columns.
r
mutate(data, new_column = expression, another_column = expression2, ...)
💻

Example

This example shows how to add a new column total by summing two existing columns x and y in a data frame.

r
library(dplyr)

data <- tibble(x = 1:5, y = 6:10)

result <- data %>% mutate(total = x + y)
print(result)
Output
# A tibble: 5 × 3 x y total <int> <int> <int> 1 1 6 7 2 2 7 9 3 3 8 11 4 4 9 13 5 5 10 15
⚠️

Common Pitfalls

Common mistakes when using mutate() include:

  • Trying to use columns that don't exist yet in the same mutate() call (you can use newly created columns later in the same call, but order matters).
  • Overwriting important columns unintentionally without realizing.
  • Forgetting to load dplyr or use the pipe operator %>%.
r
library(dplyr)

data <- tibble(a = 1:3)

# Wrong: using new column before it's created
# data %>% mutate(b = a + 1, c = b + 1) # Error: object 'b' not found

# Right: create b first, then use it
result <- data %>% mutate(b = a + 1) %>% mutate(c = b + 1)
print(result)
Output
# A tibble: 3 × 3 a b c <int> <int> <int> 1 1 2 3 2 2 3 4 3 3 4 5
📊

Quick Reference

FunctionDescriptionExample
mutate()Add or modify columnsdata %>% mutate(new_col = old_col * 2)
transmute()Create only new columns, drop othersdata %>% transmute(new_col = old_col + 1)
mutate(across())Apply function to multiple columnsdata %>% mutate(across(c(col1, col2), ~ . * 2))

Key Takeaways

Use mutate() to add or change columns in a data frame while keeping existing data.
You can create multiple new columns in one mutate() call, but order matters when referencing new columns.
Always load dplyr and use the pipe operator %>% for clear and readable code.
mutate() returns a new data frame; it does not change the original unless reassigned.
Use transmute() if you want to keep only the new columns created.