mutate() for new columns in R Programming - Time & Space Complexity
We want to understand how the time needed to add new columns with mutate() changes as the data grows.
How does the work grow when the number of rows increases?
Analyze the time complexity of the following code snippet.
library(dplyr)
data <- tibble(x = 1:1000)
data <- data %>% mutate(y = x * 2, z = y + 3)
This code creates two new columns y and z based on existing columns.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Going through each row to calculate new column values.
- How many times: Once for each row, calculating all new columns in one pass.
As the number of rows grows, the work to add new columns grows too.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 20 (2 columns x 10 rows) |
| 100 | About 200 (2 columns x 100 rows) |
| 1000 | About 2000 (2 columns x 1000 rows) |
Pattern observation: The work grows directly with the number of rows and columns added.
Time Complexity: O(n)
This means the time to add new columns grows in a straight line with the number of rows.
[X] Wrong: "Adding multiple columns with mutate() takes much more time than adding one column because it repeats over the data multiple times."
[OK] Correct: Actually, mutate() processes all new columns in one pass over the data, so the time grows mostly with the number of rows, not the number of columns.
Knowing how mutate() scales helps you write efficient data transformations and explain your choices clearly in real projects or interviews.
"What if we used mutate() inside a loop that runs for each row? How would the time complexity change?"