0
0
R-programmingHow-ToBeginner Ā· 4 min read

How to Use dplyr in R: Syntax, Examples, and Tips

To use dplyr in R, first load the package with library(dplyr). Then use its functions like filter(), select(), mutate(), and summarise() to manipulate data frames in a clear and readable way.
šŸ“

Syntax

The dplyr package uses simple verbs to manipulate data frames:

  • filter(): Select rows based on conditions.
  • select(): Choose specific columns.
  • mutate(): Add or change columns.
  • arrange(): Sort rows.
  • summarise(): Create summary statistics.
  • %>%: The pipe operator to chain commands clearly.

Each function takes a data frame as input and returns a modified data frame.

r
library(dplyr)
data %>%
  filter(condition) %>%
  select(columns) %>%
  mutate(new_column = expression) %>%
  arrange(column) %>%
  summarise(summary = function(column))
šŸ’»

Example

This example shows how to filter rows, select columns, add a new column, and summarise data using dplyr.

r
library(dplyr)

# Sample data frame
data <- data.frame(
  name = c("Anna", "Ben", "Cara", "Dan"),
  age = c(23, 35, 29, 40),
  score = c(88, 92, 95, 70)
)

# Use dplyr to filter, select, mutate, and summarise
result <- data %>%
  filter(age > 25) %>%
  select(name, score) %>%
  mutate(score_double = score * 2) %>%
  summarise(average_score = mean(score), max_score = max(score))

print(result)
Output
average_score max_score 1 85.66667 95
āš ļø

Common Pitfalls

Common mistakes when using dplyr include:

  • Not loading the package with library(dplyr).
  • Forgetting to use the pipe %>% to chain commands, which can make code harder to read.
  • Using base R functions inside dplyr verbs without proper syntax.
  • Confusing summarise() with mutate()—summarise() reduces rows, mutate() keeps them.

Example of wrong and right usage:

r
# Wrong: forgetting to load dplyr
# filter(data, age > 25) # Error if dplyr not loaded

# Right:
library(dplyr)
data %>% filter(age > 25)
šŸ“Š

Quick Reference

FunctionPurposeExample
filter()Select rows by conditiondata %>% filter(age > 30)
select()Choose columnsdata %>% select(name, age)
mutate()Add or change columnsdata %>% mutate(new_col = age * 2)
arrange()Sort rowsdata %>% arrange(desc(score))
summarise()Create summary statsdata %>% summarise(avg = mean(score))
%>%Pipe operator to chaindata %>% filter(age > 25) %>% select(name)
āœ…

Key Takeaways

Load dplyr with library(dplyr) before using its functions.
Use the pipe operator %>% to write clear, readable data manipulation steps.
Remember filter() selects rows, select() chooses columns, mutate() adds columns, and summarise() creates summaries.
Avoid mixing base R and dplyr syntax without care to prevent errors.
Practice chaining dplyr verbs to perform complex data tasks simply.