How to Use case_when in dplyr for Conditional Mutations
Use
case_when() in dplyr to create new columns based on multiple conditions. It works like an if-else ladder, checking each condition in order and assigning values accordingly. This helps you write clear and readable conditional logic inside mutate().Syntax
The case_when() function takes a series of condition-value pairs separated by commas. Each condition is a logical test, and the value is what gets assigned if that condition is true. The syntax looks like this:
condition1 ~ value1: Ifcondition1is true, assignvalue1.condition2 ~ value2: Ifcondition2is true, assignvalue2.- ...
TRUE ~ default_value: If none of the above conditions are true, assigndefault_value.
Use case_when() inside mutate() to add or modify columns in a data frame.
r
library(dplyr) data <- tibble(x = c(1, 5, 10, 15)) result <- data %>% mutate( category = case_when( x < 5 ~ "small", x >= 5 & x < 10 ~ "medium", TRUE ~ "large" ) )
Example
This example shows how to classify numbers into categories "small", "medium", or "large" using case_when(). It creates a new column category based on the value of x.
r
library(dplyr) data <- tibble(x = c(2, 7, 12, 4, 9)) result <- data %>% mutate( category = case_when( x < 5 ~ "small", x >= 5 & x < 10 ~ "medium", TRUE ~ "large" ) ) print(result)
Output
# A tibble: 5 ร 2
x category
<dbl> <chr>
1 2 small
2 7 medium
3 12 large
4 4 small
5 9 medium
Common Pitfalls
Common mistakes when using case_when() include:
- Not covering all cases, which leads to
NAvalues if no condition matches. - Using overlapping conditions that cause unexpected matches.
- Forgetting to use
TRUE ~as a catch-all default case.
Always order conditions from most specific to least specific to avoid conflicts.
r
library(dplyr) data <- tibble(score = c(85, 70, 55, 40)) # Wrong: Missing default case leads to NA result_wrong <- data %>% mutate( grade = case_when( score >= 80 ~ "A", score >= 60 ~ "B" # Missing TRUE ~ ... leads to NA for scores < 60 ) ) # Right: Add default case result_right <- data %>% mutate( grade = case_when( score >= 80 ~ "A", score >= 60 ~ "B", TRUE ~ "F" ) ) print(result_wrong) print(result_right)
Output
# A tibble: 4 ร 2
score grade
<dbl> <chr>
1 85 A
2 70 B
3 55 <NA>
4 40 <NA>
# A tibble: 4 ร 2
score grade
<dbl> <chr>
1 85 A
2 70 B
3 55 F
4 40 F
Quick Reference
| Usage | Description |
|---|---|
| condition ~ value | Assign value if condition is TRUE |
| TRUE ~ default_value | Assign default if no conditions match |
| Use inside mutate() | Add or modify columns in a data frame |
| Order conditions carefully | From most specific to least specific |
| Avoid overlapping conditions | To prevent unexpected results |
Key Takeaways
Use case_when() inside mutate() to create conditional columns in dplyr.
Always include a TRUE ~ default case to handle unmatched conditions.
Order your conditions from most specific to least specific to avoid conflicts.
case_when() returns NA if no condition matches and no default is set.
It simplifies complex if-else logic into clear, readable code.