How to Use pivot_wider in tidyr for Data Reshaping
Use
pivot_wider() from the tidyr package to convert data from long to wide format by specifying names_from for new column names and values_from for the values to fill those columns. This function spreads key-value pairs across multiple columns, making data easier to analyze in wide form.Syntax
The basic syntax of pivot_wider() includes:
data: Your input data frame or tibble.names_from: The column whose values will become new column names.values_from: The column whose values will fill the new columns.values_fill(optional): A value to replace missing entries after widening.
r
pivot_wider(data, names_from = column_to_use_for_names, values_from = column_to_use_for_values, values_fill = list(column_to_use_for_values = fill_value))
Example
This example shows how to convert a long data frame of fruit sales into a wide format where each fruit becomes a column with its sales values.
r
library(tidyr) library(dplyr) # Sample long data frame sales <- tibble( store = c("A", "A", "B", "B"), fruit = c("apple", "banana", "apple", "banana"), sales = c(10, 5, 8, 7) ) # Use pivot_wider to spread fruit types into columns sales_wide <- sales %>% pivot_wider(names_from = fruit, values_from = sales) print(sales_wide)
Output
# A tibble: 2 × 3
store apple banana
<chr> <dbl> <dbl>
1 A 10 5
2 B 8 7
Common Pitfalls
Common mistakes when using pivot_wider() include:
- Not specifying
names_fromorvalues_from, which causes errors. - Having duplicate combinations of the
id_colsandnames_fromcolumns, leading to multiple values for one cell. - Missing values after widening, which can be handled with
values_fill.
r
library(tidyr) # Example of duplicate keys causing error long_data <- tibble( id = c(1, 1, 1), key = c("A", "A", "B"), value = c(10, 20, 30) ) # This will cause an error because id=1 and key=A appear twice # pivot_wider(long_data, names_from = key, values_from = value) # Correct approach: summarize or choose one value before pivoting library(dplyr) long_data_unique <- long_data %>% group_by(id, key) %>% summarize(value = mean(value), .groups = "drop") pivot_wider(long_data_unique, names_from = key, values_from = value)
Output
# A tibble: 1 × 3
id A B
<dbl> <dbl> <dbl>
1 1 15 30
Quick Reference
| Argument | Description |
|---|---|
| data | Input data frame or tibble |
| names_from | Column to use for new column names |
| values_from | Column to use for filling new columns |
| values_fill | Value to replace missing cells after widening (optional) |
| id_cols | Columns to keep as identifiers (optional, usually inferred) |
Key Takeaways
Use pivot_wider() to reshape data from long to wide by specifying names_from and values_from.
Ensure unique combinations of id and names_from columns to avoid errors.
Use values_fill to handle missing values after widening.
Summarize or clean duplicates before pivoting to prevent conflicts.
pivot_wider() is part of the tidyr package and works well with dplyr pipelines.