t-test in R Programming - Time & Space Complexity
When running a t-test in R, it is helpful to know how the time it takes grows as the data size grows.
We want to understand how the work done changes when we have more numbers to compare.
Analyze the time complexity of the following code snippet.
# Sample data vectors
x <- rnorm(n)
y <- rnorm(n)
# Perform t-test
result <- t.test(x, y)
This code creates two groups of numbers and compares their averages using a t-test.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Calculating means, variances, and sums over the data vectors.
- How many times: Each element in both vectors is visited once during these calculations.
As the number of data points increases, the time to calculate sums and variances grows proportionally.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 20 (10 in each vector) |
| 100 | About 200 |
| 1000 | About 2000 |
Pattern observation: The work roughly doubles when the input size doubles.
Time Complexity: O(n)
This means the time to run the t-test grows in a straight line with the number of data points.
[X] Wrong: "The t-test time grows much faster than the data size because it compares every pair of points."
[OK] Correct: The t-test only calculates summary statistics like means and variances, which require looking at each data point once, not comparing all pairs.
Understanding how statistical tests scale helps you write efficient code and explain your choices clearly in real projects.
"What if we used a bootstrap method with 1000 resamples instead of a single t-test? How would the time complexity change?"