How to Use dplyr in R: Syntax, Examples, and Tips
To use
dplyr in R, first load the package with library(dplyr). Then use its functions like filter(), select(), mutate(), and summarise() to manipulate data frames in a clear and readable way.Syntax
The dplyr package uses simple verbs to manipulate data frames:
filter(): Select rows based on conditions.select(): Choose specific columns.mutate(): Add or change columns.arrange(): Sort rows.summarise(): Create summary statistics.%>%: The pipe operator to chain commands clearly.
Each function takes a data frame as input and returns a modified data frame.
r
library(dplyr) data %>% filter(condition) %>% select(columns) %>% mutate(new_column = expression) %>% arrange(column) %>% summarise(summary = function(column))
Example
This example shows how to filter rows, select columns, add a new column, and summarise data using dplyr.
r
library(dplyr) # Sample data frame data <- data.frame( name = c("Anna", "Ben", "Cara", "Dan"), age = c(23, 35, 29, 40), score = c(88, 92, 95, 70) ) # Use dplyr to filter, select, mutate, and summarise result <- data %>% filter(age > 25) %>% select(name, score) %>% mutate(score_double = score * 2) %>% summarise(average_score = mean(score), max_score = max(score)) print(result)
Output
average_score max_score
1 85.66667 95
Common Pitfalls
Common mistakes when using dplyr include:
- Not loading the package with
library(dplyr). - Forgetting to use the pipe
%>%to chain commands, which can make code harder to read. - Using base R functions inside
dplyrverbs without proper syntax. - Confusing
summarise()withmutate()āsummarise()reduces rows,mutate()keeps them.
Example of wrong and right usage:
r
# Wrong: forgetting to load dplyr # filter(data, age > 25) # Error if dplyr not loaded # Right: library(dplyr) data %>% filter(age > 25)
Quick Reference
| Function | Purpose | Example |
|---|---|---|
| filter() | Select rows by condition | data %>% filter(age > 30) |
| select() | Choose columns | data %>% select(name, age) |
| mutate() | Add or change columns | data %>% mutate(new_col = age * 2) |
| arrange() | Sort rows | data %>% arrange(desc(score)) |
| summarise() | Create summary stats | data %>% summarise(avg = mean(score)) |
| %>% | Pipe operator to chain | data %>% filter(age > 25) %>% select(name) |
Key Takeaways
Load dplyr with library(dplyr) before using its functions.
Use the pipe operator %>% to write clear, readable data manipulation steps.
Remember filter() selects rows, select() chooses columns, mutate() adds columns, and summarise() creates summaries.
Avoid mixing base R and dplyr syntax without care to prevent errors.
Practice chaining dplyr verbs to perform complex data tasks simply.