Summary statistics in R Programming - Time & Space Complexity
We want to know how the time to calculate summary statistics changes as the data grows.
How does the number of operations grow when we summarize more data?
Analyze the time complexity of the following code snippet.
data <- c(4, 8, 15, 16, 23, 42)
mean_val <- mean(data)
median_val <- median(data)
min_val <- min(data)
max_val <- max(data)
sd_val <- sd(data)
This code calculates basic summary statistics like mean, median, min, max, and standard deviation for a numeric vector.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Each summary function scans through the entire data vector once.
- How many times: There are 5 separate scans, one for each statistic.
Each function looks at every number once, so the work grows directly with data size.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 50 (5 times 10) |
| 100 | About 500 (5 times 100) |
| 1000 | About 5000 (5 times 1000) |
Pattern observation: The total work grows linearly as the data size grows.
Time Complexity: O(n)
This means the time to compute summary statistics grows in direct proportion to the number of data points.
[X] Wrong: "Calculating all summary statistics takes constant time regardless of data size."
[OK] Correct: Each statistic needs to look at every data point, so more data means more work.
Understanding how summary statistics scale helps you explain data processing speed clearly and confidently in real projects.
"What if we calculate summary statistics only once but for multiple columns in a data frame? How would the time complexity change?"