0
0
R-programmingHow-ToBeginner ยท 3 min read

How to Use case_when in dplyr for Conditional Mutations

Use case_when() in dplyr to create new columns based on multiple conditions. It works like an if-else ladder, checking each condition in order and assigning values accordingly. This helps you write clear and readable conditional logic inside mutate().
๐Ÿ“

Syntax

The case_when() function takes a series of condition-value pairs separated by commas. Each condition is a logical test, and the value is what gets assigned if that condition is true. The syntax looks like this:

  • condition1 ~ value1: If condition1 is true, assign value1.
  • condition2 ~ value2: If condition2 is true, assign value2.
  • ...
  • TRUE ~ default_value: If none of the above conditions are true, assign default_value.

Use case_when() inside mutate() to add or modify columns in a data frame.

r
library(dplyr)

data <- tibble(x = c(1, 5, 10, 15))

result <- data %>%
  mutate(
    category = case_when(
      x < 5 ~ "small",
      x >= 5 & x < 10 ~ "medium",
      TRUE ~ "large"
    )
  )
๐Ÿ’ป

Example

This example shows how to classify numbers into categories "small", "medium", or "large" using case_when(). It creates a new column category based on the value of x.

r
library(dplyr)

data <- tibble(x = c(2, 7, 12, 4, 9))

result <- data %>%
  mutate(
    category = case_when(
      x < 5 ~ "small",
      x >= 5 & x < 10 ~ "medium",
      TRUE ~ "large"
    )
  )

print(result)
Output
# A tibble: 5 ร— 2 x category <dbl> <chr> 1 2 small 2 7 medium 3 12 large 4 4 small 5 9 medium
โš ๏ธ

Common Pitfalls

Common mistakes when using case_when() include:

  • Not covering all cases, which leads to NA values if no condition matches.
  • Using overlapping conditions that cause unexpected matches.
  • Forgetting to use TRUE ~ as a catch-all default case.

Always order conditions from most specific to least specific to avoid conflicts.

r
library(dplyr)

data <- tibble(score = c(85, 70, 55, 40))

# Wrong: Missing default case leads to NA
result_wrong <- data %>%
  mutate(
    grade = case_when(
      score >= 80 ~ "A",
      score >= 60 ~ "B"
      # Missing TRUE ~ ... leads to NA for scores < 60
    )
  )

# Right: Add default case
result_right <- data %>%
  mutate(
    grade = case_when(
      score >= 80 ~ "A",
      score >= 60 ~ "B",
      TRUE ~ "F"
    )
  )

print(result_wrong)
print(result_right)
Output
# A tibble: 4 ร— 2 score grade <dbl> <chr> 1 85 A 2 70 B 3 55 <NA> 4 40 <NA> # A tibble: 4 ร— 2 score grade <dbl> <chr> 1 85 A 2 70 B 3 55 F 4 40 F
๐Ÿ“Š

Quick Reference

UsageDescription
condition ~ valueAssign value if condition is TRUE
TRUE ~ default_valueAssign default if no conditions match
Use inside mutate()Add or modify columns in a data frame
Order conditions carefullyFrom most specific to least specific
Avoid overlapping conditionsTo prevent unexpected results
โœ…

Key Takeaways

Use case_when() inside mutate() to create conditional columns in dplyr.
Always include a TRUE ~ default case to handle unmatched conditions.
Order your conditions from most specific to least specific to avoid conflicts.
case_when() returns NA if no condition matches and no default is set.
It simplifies complex if-else logic into clear, readable code.