How to Use Regex in R: Syntax, Examples, and Tips
In R, you use regex with functions like
grepl(), grep(), sub(), and gsub() to find or replace patterns in text. Regex patterns are strings that describe the text you want to match, and R applies them to character vectors for searching or modifying text.Syntax
Here are common R functions using regex patterns:
grepl(pattern, x): Returns TRUE/FALSE if pattern is found in x.grep(pattern, x): Returns indices of elements matching pattern.sub(pattern, replacement, x): Replaces first match of pattern with replacement.gsub(pattern, replacement, x): Replaces all matches of pattern with replacement.
The pattern is a regex string describing what to find.
r
grepl(pattern, x) grep(pattern, x) sub(pattern, replacement, x) gsub(pattern, replacement, x)
Example
This example shows how to find words containing 'cat' and replace 'cat' with 'dog' in a vector of strings.
r
texts <- c("The cat sat on the mat", "A caterpillar is not a cat", "Dogs are friendly") # Find which texts contain 'cat' matches <- grepl("cat", texts) print(matches) # Get indices of texts with 'cat' indices <- grep("cat", texts) print(indices) # Replace first 'cat' with 'dog' replaced_once <- sub("cat", "dog", texts) print(replaced_once) # Replace all 'cat' with 'dog' replaced_all <- gsub("cat", "dog", texts) print(replaced_all)
Output
[1] TRUE TRUE FALSE
[1] 1 2
[1] "The dog sat on the mat" "A dogerpillar is not a cat" "Dogs are friendly"
[1] "The dog sat on the mat" "A dogerpillar is not a dog" "Dogs are friendly"
Common Pitfalls
Common mistakes when using regex in R include:
- Not escaping special characters like
.,?, or\properly. - Confusing
sub()(replaces first match) withgsub()(replaces all matches). - Using regex patterns without quotes or with wrong quotes.
- Forgetting that regex is case sensitive by default.
Example of wrong and right usage:
r
# Wrong: unescaped dot matches any character texts <- c("cat", "cot", "cut") grepl("c.t", texts) # TRUE for all # Right: escape dot to match literal dot texts2 <- c("file.txt", "file1txt") grepl("file\.txt", texts2) # TRUE only for exact 'file.txt'
Output
[1] TRUE TRUE TRUE
[1] TRUE FALSE
Quick Reference
Basic regex symbols in R:
| Symbol | Meaning | Example |
|---|---|---|
| . | Any single character | "c.t" matches "cat", "cot" |
| ^ | Start of string | "^cat" matches "cat" in "cat dog" |
| $ | End of string | "dog$" matches "big dog" |
| \d | Digit (0-9) | "\d" matches "1" in "a1b" |
| \w | Word character (letter, digit, _) | "\w+" matches words |
| * | Zero or more repetitions | "ca*t" matches "ct", "cat", "caaat" |
| + | One or more repetitions | "ca+t" matches "cat", "caaat" but not "ct" |
| ? | Zero or one repetition | "ca?t" matches "cat" or "ct" |
| [] | Character class | "[cb]at" matches "cat" or "bat" |
Key Takeaways
Use grepl() to check if a regex pattern exists in text and grep() to find matching indices.
Use sub() to replace the first match and gsub() to replace all matches of a regex pattern.
Always escape special regex characters like dot (.) with double backslash (\\.) in R strings.
Regex patterns are case sensitive by default; use ignore.case=TRUE to ignore case.
Test your regex patterns carefully to avoid unexpected matches or replacements.