0
0
PyTesttesting~15 mins

Flaky test detection and retry in PyTest - Deep Dive

Choose your learning style9 modes available
Overview - Flaky test detection and retry
What is it?
Flaky test detection and retry is a way to find and handle tests that sometimes pass and sometimes fail without any code changes. These tests are called flaky because their results are not stable. The retry part means running the test again automatically if it fails, to see if it was a temporary problem. This helps keep test results trustworthy and reduces false alarms.
Why it matters
Without flaky test detection, developers waste time chasing bugs that aren't real. Flaky tests hide real problems and make teams lose trust in automated testing. Detecting and retrying flaky tests helps keep the test suite reliable and saves time by avoiding unnecessary debugging. It improves confidence in software quality and speeds up development.
Where it fits
Before learning flaky test detection, you should understand basic automated testing and how to write tests in pytest. After this, you can learn advanced test stability techniques, continuous integration setups, and test result analysis tools.
Mental Model
Core Idea
Flaky test detection and retry means catching tests that fail unpredictably and rerunning them to confirm if the failure is real or just a temporary glitch.
Think of it like...
It's like a smoke alarm that sometimes goes off because of steam, not fire. Instead of calling the fire department immediately, you check again to see if the alarm is still sounding before taking action.
┌───────────────┐
│ Run Test      │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Test Passes?  │
└──────┬────────┘
       │Yes
       ▼
┌───────────────┐
│ Report Pass   │
└───────────────┘

       │No
       ▼
┌───────────────┐
│ Retry Test    │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Pass on Retry?│
└──────┬────────┘
       │Yes
       ▼
┌───────────────┐
│ Mark Flaky    │
│ Report Pass   │
└───────────────┘

       │No
       ▼
┌───────────────┐
│ Report Fail   │
└───────────────┘
Build-Up - 6 Steps
1
FoundationUnderstanding Flaky Tests
🤔
Concept: Introduce what flaky tests are and why they cause problems.
A flaky test is a test that sometimes passes and sometimes fails without any changes in the code. This can happen because of timing issues, network delays, or shared resources. For example, a test that depends on the current time or external service might fail randomly.
Result
You can identify that some tests are unreliable and cause confusion in test results.
Understanding flaky tests is key to knowing why test results might not always be trustworthy.
2
FoundationBasic Pytest Test Writing
🤔
Concept: Learn how to write simple tests in pytest to prepare for flaky test handling.
In pytest, you write tests as functions starting with 'test_'. For example: def test_addition(): assert 1 + 1 == 2 This test always passes because the result is predictable.
Result
You can write and run basic tests using pytest.
Knowing how to write tests is necessary before adding retry logic for flaky tests.
3
IntermediateDetecting Flaky Tests with pytest-rerunfailures
🤔Before reading on: Do you think retrying a test automatically can hide real bugs or help find flaky tests? Commit to your answer.
Concept: Introduce the pytest plugin 'pytest-rerunfailures' that retries failed tests automatically.
Install the plugin with 'pip install pytest-rerunfailures'. Then run tests with: pytest --reruns 3 This command reruns failed tests up to 3 times before marking them as failed. If a test passes on retry, it is considered flaky.
Result
Tests that fail once but pass on retry are detected as flaky and do not cause a full test suite failure.
Retrying tests helps separate real failures from temporary glitches, improving test reliability.
4
IntermediateMarking Flaky Tests Explicitly
🤔Before reading on: Should flaky tests be fixed immediately or marked to track their instability? Commit to your answer.
Concept: Learn how to mark tests as flaky using decorators to handle known flaky tests better.
Use the 'flaky' plugin or pytest markers to mark flaky tests: import pytest @pytest.mark.flaky(reruns=3) def test_flaky_feature(): # test code This reruns the test 3 times if it fails, helping isolate flaky behavior.
Result
Known flaky tests are retried automatically, reducing noise in test reports.
Explicitly marking flaky tests helps teams focus on fixing them later without blocking development.
5
AdvancedAnalyzing Flaky Test Causes
🤔Before reading on: Do you think flaky tests are mostly caused by test code or the application code? Commit to your answer.
Concept: Explore common reasons for flaky tests and how to investigate them.
Flaky tests often result from: - Timing issues (e.g., race conditions) - External dependencies (network, databases) - Shared state between tests - Resource limits (memory, CPU) To analyze, add logging, isolate tests, and run them repeatedly to find patterns.
Result
You can identify root causes of flakiness and plan fixes.
Knowing why tests are flaky is essential to fix them and improve test suite health.
6
ExpertIntegrating Flaky Test Detection in CI Pipelines
🤔Before reading on: Should flaky test retries be part of local testing or only in continuous integration? Commit to your answer.
Concept: Learn how to configure CI systems to detect flaky tests and report them properly.
In CI pipelines, configure pytest with retry options and collect flaky test reports separately. For example, use: pytest --reruns 2 --reruns-delay 5 This retries failed tests twice with a delay. CI tools can parse reports to highlight flaky tests for developers.
Result
Flaky tests are detected early in CI, preventing unstable builds and alerting teams to fix issues.
Integrating flaky test detection in CI ensures consistent quality and prevents flaky tests from blocking releases.
Under the Hood
When pytest runs a test with retry enabled, it catches test failures and reruns the test function up to the specified number of times. If the test passes on any retry, pytest marks it as passed but notes it as flaky. This works by wrapping the test call in a loop and tracking results internally. The retry delay option adds a pause between attempts to reduce timing-related flakiness.
Why designed this way?
Flaky test retry was designed to reduce noise in test results caused by transient failures. Instead of failing the whole suite on a single temporary glitch, retrying helps confirm if the failure is consistent. This approach balances test reliability with developer productivity. Alternatives like ignoring failures were rejected because they hide real problems.
┌───────────────┐
│ Start Test    │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Run Test Func │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Test Pass?    │
└──────┬────────┘
       │Yes
       ▼
┌───────────────┐
│ Mark Passed   │
└───────────────┘

       │No
       ▼
┌───────────────┐
│ Retry Count < N?│
└──────┬────────┘
       │Yes
       ▼
┌───────────────┐
│ Wait (optional)│
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Run Test Func │
└───────────────┘

       │No
       ▼
┌───────────────┐
│ Mark Failed   │
└───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does retrying flaky tests always hide real bugs? Commit to yes or no.
Common Belief:Retrying flaky tests just hides real bugs and should be avoided.
Tap to reveal reality
Reality:Retrying helps distinguish between real bugs and temporary failures, improving test reliability.
Why it matters:Without retries, teams waste time chasing false failures or ignore flaky tests that mask real issues.
Quick: Are flaky tests always caused by bad test code? Commit to yes or no.
Common Belief:Flaky tests are always due to poorly written test code.
Tap to reveal reality
Reality:Flakiness can come from application code, environment issues, or external dependencies, not just test code.
Why it matters:Blaming only test code can lead to ignoring real application problems causing flakiness.
Quick: Should flaky test retries be used in local development? Commit to yes or no.
Common Belief:Retrying flaky tests is only useful in continuous integration, not locally.
Tap to reveal reality
Reality:Retrying locally helps developers catch flaky tests early and fix them before pushing code.
Why it matters:Skipping retries locally can delay flaky test detection and cause unstable builds later.
Quick: Does marking a test as flaky mean you can ignore fixing it? Commit to yes or no.
Common Belief:Marking a test as flaky means it's okay to leave it broken indefinitely.
Tap to reveal reality
Reality:Marking flaky tests is a temporary measure; they should be fixed to maintain test suite health.
Why it matters:Ignoring flaky tests leads to growing instability and loss of trust in automated tests.
Expert Zone
1
Flaky test retries can mask timing-related bugs if retry counts are too high, so balance is key.
2
Some flaky tests only appear under specific environment conditions, requiring environment-aware retry strategies.
3
Integrating flaky test detection with test analytics tools helps prioritize which flaky tests to fix first.
When NOT to use
Retrying tests is not suitable when failures indicate critical bugs that must be fixed immediately. Instead, use test isolation, mocks, or fix the root cause. Also, avoid retries for tests with side effects or those that modify shared state, as retries can cause inconsistent results.
Production Patterns
In production, flaky test detection is integrated into CI pipelines with limited retries and reporting. Teams track flaky tests separately and assign them for fixes. Some use quarantine environments to isolate flaky tests. Advanced setups combine retries with test parallelization and resource monitoring to reduce flakiness.
Connections
Circuit Breaker Pattern (Software Architecture)
Both detect unstable conditions and retry or fallback to maintain system stability.
Understanding flaky test retries helps grasp how systems handle temporary failures gracefully, like circuit breakers prevent cascading failures.
Statistical Sampling (Mathematics)
Retrying tests multiple times is like sampling to confirm if a failure is consistent or random.
Knowing statistical sampling clarifies why multiple test runs reduce false positives in flaky test detection.
Quality Control in Manufacturing
Both involve detecting inconsistent product quality and deciding when to accept or reject based on repeated checks.
Seeing flaky tests as quality control helps understand the importance of repeatability and reliability in software testing.
Common Pitfalls
#1Ignoring flaky tests and letting them fail silently.
Wrong approach:def test_feature(): assert some_function() == expected_result # flaky but no retry or marking
Correct approach:import pytest @pytest.mark.flaky(reruns=3) def test_feature(): assert some_function() == expected_result
Root cause:Not recognizing flaky tests leads to ignoring retries or markings, causing unstable test suites.
#2Setting retry count too high, hiding real bugs.
Wrong approach:pytest --reruns 10 # retries too many times, masking failures
Correct approach:pytest --reruns 2 # limited retries to balance detection and bug visibility
Root cause:Overusing retries can hide real issues, reducing test effectiveness.
#3Retrying tests with side effects causing inconsistent state.
Wrong approach:def test_modify_db(): db.insert(data) assert db.count() == expected # retried without cleanup
Correct approach:def test_modify_db(): db.setup_clean_state() db.insert(data) assert db.count() == expected
Root cause:Not isolating test state causes retries to produce unreliable results.
Key Takeaways
Flaky tests cause unpredictable test results and reduce trust in automated testing.
Retrying failed tests helps distinguish between real failures and temporary glitches.
Explicitly marking flaky tests allows teams to track and fix them without blocking progress.
Analyzing flaky test causes is essential to improve test reliability and software quality.
Integrating flaky test detection in CI pipelines ensures stable builds and early problem detection.