How to Use str_extract in R: Extract Text with Patterns
In R, use
str_extract() from the stringr package to extract the first substring that matches a pattern from a string. Provide the string and a regular expression pattern as arguments, and it returns the matched text or NA if no match is found.Syntax
The basic syntax of str_extract() is:
string: The input character vector where you want to find matches.pattern: A regular expression pattern describing the text to extract.
The function returns the first match found in each string or NA if no match exists.
r
str_extract(string, pattern)
Example
This example shows how to extract the first number from each string in a vector using str_extract().
r
library(stringr) texts <- c("Order 1234", "No number here", "ID: 5678 and 91011") numbers <- str_extract(texts, "\\d+") print(numbers)
Output
[1] "1234" NA "5678"
Common Pitfalls
Common mistakes include:
- Not loading the
stringrpackage before usingstr_extract(). - Using incorrect regular expression syntax, like forgetting to escape special characters.
- Expecting
str_extract()to return all matches instead of only the first match (usestr_extract_all()for all matches).
r
library(stringr) # Wrong: forgetting to escape backslash # str_extract("Price: $100", "\\d+") # Correct # str_extract("Price: $100", "\d+") # Incorrect, will cause error # Correct usage str_extract("Price: $100", "\\d+")
Output
[1] "100"
Quick Reference
| Function | Description |
|---|---|
| str_extract(string, pattern) | Extracts first match of pattern from string |
| str_extract_all(string, pattern) | Extracts all matches of pattern from string |
| stringr::str_detect(string, pattern) | Checks if pattern exists in string (TRUE/FALSE) |
| stringr::str_replace(string, pattern, replacement) | Replaces first match of pattern with replacement |
Key Takeaways
Use str_extract() to get the first matching text from a string using a regex pattern.
Always load the stringr package with library(stringr) before using str_extract().
Escape special characters in regex patterns properly with double backslashes (\\).
str_extract() returns NA if no match is found in the string.
For all matches, use str_extract_all() instead of str_extract().