t-test (ttest_ind, ttest_rel) in SciPy - Time & Space Complexity
We want to understand how the time it takes to run a t-test grows as the amount of data increases.
Specifically, how does the test behave when we have more numbers to compare?
Analyze the time complexity of the following code snippet.
from scipy.stats import ttest_ind
# Two independent samples
sample1 = [1, 2, 3, 4, 5]
sample2 = [2, 3, 4, 5, 6]
result = ttest_ind(sample1, sample2)
print(result)
This code runs an independent t-test to compare the means of two groups of numbers.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Calculating sums and variances by going through each number in both samples.
- How many times: Each sample is scanned once to compute statistics.
As the number of data points in each sample grows, the time to calculate sums and variances grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 20 (two samples of 10 each) |
| 100 | About 200 |
| 1000 | About 2000 |
Pattern observation: Doubling the data roughly doubles the work needed.
Time Complexity: O(n)
This means the time to run the t-test grows linearly with the number of data points.
[X] Wrong: "The t-test time grows with the square of the data size because it compares every pair of points."
[OK] Correct: The t-test only needs to calculate sums and variances, which requires looking at each data point once, not comparing all pairs.
Knowing how the t-test scales helps you explain performance when working with bigger datasets, showing you understand both statistics and efficiency.
"What if we used a paired t-test (ttest_rel) on samples of the same size? How would the time complexity change?"