0
0
R-programmingHow-ToBeginner ยท 3 min read

How to Use str_extract in R: Extract Text with Patterns

In R, use str_extract() from the stringr package to extract the first substring that matches a pattern from a string. Provide the string and a regular expression pattern as arguments, and it returns the matched text or NA if no match is found.
๐Ÿ“

Syntax

The basic syntax of str_extract() is:

  • string: The input character vector where you want to find matches.
  • pattern: A regular expression pattern describing the text to extract.

The function returns the first match found in each string or NA if no match exists.

r
str_extract(string, pattern)
๐Ÿ’ป

Example

This example shows how to extract the first number from each string in a vector using str_extract().

r
library(stringr)

texts <- c("Order 1234", "No number here", "ID: 5678 and 91011")

numbers <- str_extract(texts, "\\d+")
print(numbers)
Output
[1] "1234" NA "5678"
โš ๏ธ

Common Pitfalls

Common mistakes include:

  • Not loading the stringr package before using str_extract().
  • Using incorrect regular expression syntax, like forgetting to escape special characters.
  • Expecting str_extract() to return all matches instead of only the first match (use str_extract_all() for all matches).
r
library(stringr)

# Wrong: forgetting to escape backslash
# str_extract("Price: $100", "\\d+") # Correct
# str_extract("Price: $100", "\d+") # Incorrect, will cause error

# Correct usage
str_extract("Price: $100", "\\d+")
Output
[1] "100"
๐Ÿ“Š

Quick Reference

FunctionDescription
str_extract(string, pattern)Extracts first match of pattern from string
str_extract_all(string, pattern)Extracts all matches of pattern from string
stringr::str_detect(string, pattern)Checks if pattern exists in string (TRUE/FALSE)
stringr::str_replace(string, pattern, replacement)Replaces first match of pattern with replacement
โœ…

Key Takeaways

Use str_extract() to get the first matching text from a string using a regex pattern.
Always load the stringr package with library(stringr) before using str_extract().
Escape special characters in regex patterns properly with double backslashes (\\).
str_extract() returns NA if no match is found in the string.
For all matches, use str_extract_all() instead of str_extract().