0
0
R-programmingHow-ToBeginner · 3 min read

How to Use Regex in R: Syntax, Examples, and Tips

In R, you use regex with functions like grepl(), grep(), sub(), and gsub() to find or replace patterns in text. Regex patterns are strings that describe the text you want to match, and R applies them to character vectors for searching or modifying text.
📐

Syntax

Here are common R functions using regex patterns:

  • grepl(pattern, x): Returns TRUE/FALSE if pattern is found in x.
  • grep(pattern, x): Returns indices of elements matching pattern.
  • sub(pattern, replacement, x): Replaces first match of pattern with replacement.
  • gsub(pattern, replacement, x): Replaces all matches of pattern with replacement.

The pattern is a regex string describing what to find.

r
grepl(pattern, x)
grep(pattern, x)
sub(pattern, replacement, x)
gsub(pattern, replacement, x)
💻

Example

This example shows how to find words containing 'cat' and replace 'cat' with 'dog' in a vector of strings.

r
texts <- c("The cat sat on the mat", "A caterpillar is not a cat", "Dogs are friendly")

# Find which texts contain 'cat'
matches <- grepl("cat", texts)
print(matches)

# Get indices of texts with 'cat'
indices <- grep("cat", texts)
print(indices)

# Replace first 'cat' with 'dog'
replaced_once <- sub("cat", "dog", texts)
print(replaced_once)

# Replace all 'cat' with 'dog'
replaced_all <- gsub("cat", "dog", texts)
print(replaced_all)
Output
[1] TRUE TRUE FALSE [1] 1 2 [1] "The dog sat on the mat" "A dogerpillar is not a cat" "Dogs are friendly" [1] "The dog sat on the mat" "A dogerpillar is not a dog" "Dogs are friendly"
⚠️

Common Pitfalls

Common mistakes when using regex in R include:

  • Not escaping special characters like ., ?, or \ properly.
  • Confusing sub() (replaces first match) with gsub() (replaces all matches).
  • Using regex patterns without quotes or with wrong quotes.
  • Forgetting that regex is case sensitive by default.

Example of wrong and right usage:

r
# Wrong: unescaped dot matches any character
texts <- c("cat", "cot", "cut")
grepl("c.t", texts) # TRUE for all

# Right: escape dot to match literal dot
texts2 <- c("file.txt", "file1txt")
grepl("file\.txt", texts2) # TRUE only for exact 'file.txt'
Output
[1] TRUE TRUE TRUE [1] TRUE FALSE
📊

Quick Reference

Basic regex symbols in R:

SymbolMeaningExample
.Any single character"c.t" matches "cat", "cot"
^Start of string"^cat" matches "cat" in "cat dog"
$End of string"dog$" matches "big dog"
\dDigit (0-9)"\d" matches "1" in "a1b"
\wWord character (letter, digit, _)"\w+" matches words
*Zero or more repetitions"ca*t" matches "ct", "cat", "caaat"
+One or more repetitions"ca+t" matches "cat", "caaat" but not "ct"
?Zero or one repetition"ca?t" matches "cat" or "ct"
[]Character class"[cb]at" matches "cat" or "bat"

Key Takeaways

Use grepl() to check if a regex pattern exists in text and grep() to find matching indices.
Use sub() to replace the first match and gsub() to replace all matches of a regex pattern.
Always escape special regex characters like dot (.) with double backslash (\\.) in R strings.
Regex patterns are case sensitive by default; use ignore.case=TRUE to ignore case.
Test your regex patterns carefully to avoid unexpected matches or replacements.