ANOVA in R Programming - Time & Space Complexity
We want to understand how the time it takes to run ANOVA changes as the data size grows.
Specifically, how does the number of data points and groups affect the work done?
Analyze the time complexity of the following R code for one-way ANOVA.
# Sample data
values <- c(5, 6, 7, 8, 9, 10, 11, 12)
groups <- factor(c('A', 'A', 'B', 'B', 'C', 'C', 'D', 'D'))
# Perform one-way ANOVA
result <- aov(values ~ groups)
summary(result)
This code compares means of values across groups using ANOVA.
Look at what repeats as data grows:
- Primary operation: Calculating group means and variances by scanning all data points.
- How many times: Each data point is visited once to compute sums and sums of squares.
As the number of data points (n) increases, the work grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 visits to data points |
| 100 | About 100 visits to data points |
| 1000 | About 1000 visits to data points |
Pattern observation: Doubling data roughly doubles the work.
Time Complexity: O(n)
This means the time to run ANOVA grows linearly with the number of data points.
[X] Wrong: "ANOVA time grows with the square of data size because it compares all pairs of points."
[OK] Correct: ANOVA uses group summaries, not pairwise comparisons, so it only scans data once.
Understanding how ANOVA scales helps you explain performance when working with larger datasets in real projects.
"What if we added more groups but kept total data points the same? How would the time complexity change?"