How to Use separate() in tidyr to Split Columns in R
Use
separate() from the tidyr package to split one column into multiple columns by specifying the column to split and the separator. You provide the column name, new column names, and optionally the separator character or regex to split the data cleanly.Syntax
The basic syntax of separate() is:
data: Your data frame or tibble.col: The name of the column to split.into: A character vector of new column names.sep: The separator to split on (default is non-alphanumeric).remove: Whether to remove the original column (defaultTRUE).convert: Whether to convert the new columns to appropriate types (defaultFALSE).
r
separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE)Example
This example shows how to split a column with names separated by a space into two columns: first and last names.
r
library(tidyr) library(dplyr) # Sample data frame people <- tibble(name = c("John Doe", "Jane Smith", "Alice Johnson")) # Use separate() to split 'name' into 'first' and 'last' people_separated <- people %>% separate(name, into = c("first", "last"), sep = " ") print(people_separated)
Output
# A tibble: 3 × 2
first last
<chr> <chr>
1 John Doe
2 Jane Smith
3 Alice Johnson
Common Pitfalls
Common mistakes when using separate() include:
- Not specifying enough new column names in
intofor the splits, causing errors. - Using the wrong separator
sep, so the column does not split as expected. - Forgetting that
separate()removes the original column by default, which might be unwanted. - Not setting
convert = TRUEwhen you want the new columns to be converted to numbers or factors automatically.
Example of a wrong and right way:
r
library(tidyr) library(dplyr) # Wrong: not enough names in 'into' people <- tibble(name = c("John Doe", "Jane Smith")) # This will cause an error because 'into' has only one name but two parts exist # people %>% separate(name, into = c("first"), sep = " ") # Right: provide two names people %>% separate(name, into = c("first", "last"), sep = " ")
Output
# A tibble: 2 × 2
first last
<chr> <chr>
1 John Doe
2 Jane Smith
Quick Reference
| Argument | Description |
|---|---|
| data | Data frame or tibble to modify |
| col | Column name to split |
| into | Vector of new column names |
| sep | Separator character or regex (default splits on non-alphanumeric) |
| remove | Remove original column? (default TRUE) |
| convert | Convert new columns to appropriate types? (default FALSE) |
Key Takeaways
Use separate() to split one column into multiple columns by specifying the column and new names.
Always provide enough new column names in 'into' to match the number of splits.
Specify the correct separator with 'sep' to split data as expected.
Set 'remove = FALSE' if you want to keep the original column after splitting.
Use 'convert = TRUE' to automatically change new columns to proper data types.