Regression testing in Software Engineering - Time & Space Complexity
When studying regression testing, it is important to understand how the time to run a test suite grows as the codebase and number of tests increase.
We want to know how test execution scales and why test selection strategies matter.
Analyze the time complexity of running a full regression test suite versus a selective suite.
Full regression:
for each test_case in all_tests:
setup test environment
execute test
compare actual vs expected result
record pass/fail
Selective regression:
affected_modules = analyze_dependencies(changed_files)
for each test_case in all_tests:
if test_case.module in affected_modules:
execute test
Full regression runs every test. Selective regression uses dependency analysis to run only relevant tests.
Look at what repeats in each approach.
- Full regression: Every test case is executed regardless of the change. The loop runs n times for n total tests.
- Selective regression: Dependency analysis runs once, then only tests mapped to affected modules execute — typically a fraction of n.
As the project grows, the test suite grows with it.
| Total Tests (n) | Full Regression | Selective (20% affected) |
|---|---|---|
| 100 tests | 100 executed | ~20 executed |
| 1,000 tests | 1,000 executed | ~200 executed |
| 10,000 tests | 10,000 executed | ~2,000 executed |
Pattern observation: Full regression is O(n) — every test runs. Selective regression is also O(n) for the filtering step but executes only O(k) tests where k is the number of affected tests, typically k << n.
Full Regression: O(n) where n is the total number of test cases
Selective Regression: O(n + k) — O(n) to filter, O(k) to execute, where k is tests for affected modules
In practice, selective regression saves significant time because k is much smaller than n for localized changes.
[X] Wrong: "We only need to test the code we changed, not anything else."
[OK] Correct: Changes in one module can break dependent modules. Regression testing must cover not just the changed code but all code that depends on it. This is why dependency analysis in selective regression includes affected modules, not just changed files.
Understanding regression testing complexity helps explain CI/CD pipeline design decisions. Knowing when to use full vs selective regression is a common interview topic in QA and DevOps discussions.
If a project has 10,000 tests and average test execution takes 2 seconds, how long does a full regression run take? How would test parallelization across 10 machines change this?