sub and gsub in R Programming - Time & Space Complexity
We want to understand how the time it takes to replace text grows when using sub and gsub in R.
How does the work change as the text gets longer or has more matches?
Analyze the time complexity of the following code snippet.
text <- c("apple apple", "banana apple", "apple banana apple")
result_sub <- sub("apple", "orange", text)
result_gsub <- gsub("apple", "orange", text)
This code replaces the first occurrence of "apple" with "orange" using sub, and all occurrences using gsub, in a vector of strings.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning each string to find matches of the pattern.
- How many times: For
sub, scanning stops after the first match per string; forgsub, scanning continues through the entire string to find all matches.
As the length of each string or the number of strings grows, the work increases.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 (short strings) | About 10 scans for sub, 10 full scans for gsub |
| 100 (longer strings) | About 100 scans for sub, 100 full scans for gsub |
| 1000 (many long strings) | About 1000 scans for sub, 1000 full scans for gsub |
Pattern observation: The time grows roughly in direct proportion to the total length of all strings. gsub does more work per string because it finds all matches, not just the first.
Time Complexity: O(n)
This means the time to replace text grows linearly with the total size of the input strings.
[X] Wrong: "sub and gsub always take the same time because they do similar replacements."
[OK] Correct: sub stops after the first match in each string, so it often does less work than gsub, which finds and replaces all matches.
Understanding how string replacement scales helps you write efficient code when working with text data, a common task in many projects.
"What if we changed the pattern to a more complex regular expression? How would the time complexity change?"