How to Extract Substring in R: Simple Syntax and Examples
In R, you can extract a substring using the
substring() or substr() functions by specifying the start and end positions. Both functions return the part of the string between these positions as a new string.Syntax
The two main functions to extract substrings in R are substring(text, first, last) and substr(text, start, stop).
- text: The original string or vector of strings.
- first/start: The starting position of the substring (1-based index).
- last/stop: The ending position of the substring.
Both functions return the substring from the start position up to the end position.
r
substring(text, first, last) substr(text, start, stop)
Example
This example shows how to extract a substring from a string using both substring() and substr(). It extracts characters from position 3 to 7.
r
text <- "Hello, world!" sub1 <- substring(text, 3, 7) sub2 <- substr(text, 3, 7) print(sub1) print(sub2)
Output
[1] "llo, "
[1] "llo, "
Common Pitfalls
One common mistake is using zero or negative indices, which are invalid in R substring functions because indexing starts at 1. Another is mixing up the start and end positions, which can lead to unexpected results or empty strings.
Also, substring() can handle vectors and recycle indices, while substr() is simpler but less flexible.
r
text <- "Example" # Wrong: start index 0 (invalid) # substring(text, 0, 3) # returns "" # Correct: substring(text, 1, 3) # returns "Exa"
Output
[1] "Exa"
Quick Reference
| Function | Usage | Notes |
|---|---|---|
| substring(text, first, last) | Extract substring from first to last position | Handles vectors, recycles indices |
| substr(text, start, stop) | Extract substring from start to stop position | Simpler, works on single strings or vectors |
Key Takeaways
Use substring() or substr() with start and end positions to extract substrings in R.
Positions start at 1; zero or negative indices are invalid and cause empty results.
substring() is more flexible with vectors, while substr() is simpler for single strings.
Always check that start is less than or equal to end to avoid empty strings.