Parameterized reports in R Programming - Time & Space Complexity
When creating parameterized reports in R, it's important to know how the report's run time changes as the input data grows.
We want to find out how the time to generate the report grows when we change the size of the data or parameters.
Analyze the time complexity of the following code snippet.
library(dplyr)
generate_report <- function(data, filter_value) {
filtered_data <- data %>% filter(category == filter_value)
summary <- filtered_data %>% summarise(count = n(), avg = mean(value))
return(summary)
}
# Example call:
# generate_report(large_data_frame, "A")
This code filters a data frame by a parameter and then summarizes the filtered results.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Filtering the data frame rows based on the parameter.
- How many times: Each row is checked once during filtering.
As the data size grows, the filtering step checks more rows, so the time grows roughly in proportion to the number of rows.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 row checks |
| 100 | About 100 row checks |
| 1000 | About 1000 row checks |
Pattern observation: The work grows directly with the number of rows in the data.
Time Complexity: O(n)
This means the time to generate the report grows linearly as the data size increases.
[X] Wrong: "Filtering by a parameter is instant and does not depend on data size."
[OK] Correct: Filtering must check each row to see if it matches, so it takes longer with more rows.
Understanding how filtering and summarizing scale with data size helps you explain report performance clearly and confidently.
"What if we indexed the data by category before filtering? How would the time complexity change?"