Descriptive statistics in R Programming - Time & Space Complexity
When we calculate descriptive statistics, we want to know how long it takes as our data grows.
We ask: How does the time to get summaries like mean or median change with more data?
Analyze the time complexity of the following code snippet.
data <- c(4, 7, 1, 8, 5, 9, 2, 6, 3, 10)
mean_value <- mean(data)
median_value <- median(data)
summary_stats <- summary(data)
This code calculates basic descriptive statistics: mean, median, and a summary of the data.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Traversing the data vector to compute statistics.
- How many times: Each function (mean, median, summary) scans the data at least once.
As the number of data points grows, the time to compute these statistics grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 operations per statistic |
| 100 | About 100 operations per statistic |
| 1000 | About 1000 operations per statistic |
Pattern observation: The operations increase linearly as data size increases.
Time Complexity: O(n)
This means the time to calculate descriptive statistics grows in a straight line with the amount of data.
[X] Wrong: "Calculating mean or median takes the same time no matter how much data there is."
[OK] Correct: These calculations must look at each data point, so more data means more work and more time.
Understanding how data size affects calculation time helps you explain your code's efficiency clearly and confidently.
"What if we used a sorted data structure to keep data sorted as we add points? How would the time complexity of median calculation change?"