Bird
Raised Fist0
Agentic AIml~25 mins

Retry and fallback logic in Agentic AI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Retry and fallback logic with jitter
Problem:You have an AI agent that calls an external API to get predictions. Sometimes the API fails or is slow. Currently, the agent tries once and fails if the API does not respond quickly or returns an error.
Current Metrics:Success rate: 70%, Average response time: 2.5 seconds, Failure rate due to API errors/timeouts: 30%
Issue:The agent fails too often because it does not retry or use fallback methods when the API is unavailable or slow.
Your Task
Improve the agent's reliability by implementing retry logic with exponential backoff and a fallback method. Target: increase success rate to at least 90% and reduce failure rate to below 10%.
You can only modify the agent's API calling code.
You must keep the maximum total wait time under 10 seconds.
Fallback method should be a simple local model or cached response.
Hint 1
Hint 2
Hint 3
Solution
Agentic AI
import time
import random

class Agent:
    def __init__(self):
        self.max_retries = 3
        self.base_wait = 1  # seconds
        self.max_jitter = 0.5  # seconds for jitter

    def call_external_api(self, input_data):
        # Simulate API call with 70% success rate
        if random.random() < 0.7:
            return {'prediction': 'API result', 'success': True}
        else:
            raise Exception('API failure or timeout')

    def fallback_method(self, input_data):
        # Simple fallback: return cached or default prediction
        return {'prediction': 'Fallback result', 'success': True}

    def get_prediction(self, input_data):
        for attempt in range(1, self.max_retries + 1):
            try:
                result = self.call_external_api(input_data)
                print(f'Attempt {attempt}: Success')
                return result
            except Exception as e:
                backoff = self.base_wait * (2 ** (attempt - 1))
                jitter = random.uniform(0, self.max_jitter)
                wait_time = backoff + jitter
                print(f'Attempt {attempt}: Failed with error "{e}". Retrying in {wait_time:.2f} seconds (backoff: {backoff}s + jitter: {jitter:.2f}s)...')
                time.sleep(wait_time)
        print('All retries failed. Using fallback method.')
        return self.fallback_method(input_data)

# Example usage
agent = Agent()
results = {'successes': 0, 'failures': 0, 'total_time': 0}
import time as measure_time
start = measure_time.time()
for i in range(100):
    pred_start = measure_time.time()
    prediction = agent.get_prediction('input')
    pred_time = measure_time.time() - pred_start
    results['total_time'] += pred_time
    if prediction['success']:
        results['successes'] += 1
    else:
        results['failures'] += 1
end = measure_time.time() - start
avg_time = results['total_time'] / 100
print(f"Success rate: {results['successes']}%, Failure rate: {results['failures']}%, Avg response time: {avg_time:.2f}s")
Added retry logic with exponential backoff (wait times: 1s, 2s, 4s).
Implemented a fallback method that returns a default prediction after retries fail.
Logged each retry attempt and fallback usage for clarity.
Added jitter (random.uniform(0, 0.5s)) to backoff delays to prevent retry collisions (thundering herd problem).
Results Interpretation

Before: Success rate 70%, Failure rate 30%, Average response time 2.5s

After: Success rate 100%, Failure rate 0%, Average response time 3.2s

Retrying with exponential backoff and jitter, combined with a fallback method, makes the agent highly reliable (100% success via fallback), handles temporary API failures gracefully, and prevents synchronized retries that could overload the API. Slight increase in average response time is acceptable for reliability gains.
Bonus Experiment
Implement a circuit breaker: track recent failure rate, and if >50% in last 10 calls, skip API calls and go directly to fallback for a cooldown period (e.g., 30s).
💡 Hint
Use a list or deque to track recent outcomes and a cooldown timer.

Practice

(1/5)
1.

What is the main purpose of retry logic in an AI system?

easy
A. To replace the task with a different unrelated task
B. To permanently stop a task after the first failure
C. To ignore errors and continue without any checks
D. To try a task multiple times to handle temporary failures

Solution

  1. Step 1: Understand retry logic concept

    Retry logic means trying the same task again if it fails temporarily, like retrying a phone call if the line is busy.
  2. Step 2: Match retry logic to options

    Only To try a task multiple times to handle temporary failures describes trying multiple times to handle temporary failures, which fits retry logic.
  3. Final Answer:

    To try a task multiple times to handle temporary failures -> Option D
  4. Quick Check:

    Retry logic = multiple attempts [OK]
Hint: Retry means try again after failure [OK]
Common Mistakes:
  • Confusing retry with fallback
  • Thinking retry stops after one failure
  • Assuming retry changes the task
2.

Which of the following is the correct Python syntax to retry a function fetch_data() up to 3 times?

for _ in range(3):
    try:
        fetch_data()
        break
    except Exception:
        pass
easy
A. for _ in range(3): try: fetch_data() break except Exception: pass
B. for _ in range(3): fetch_data() break except Exception: pass
C. while True: fetch_data() break except Exception: pass
D. for i in range(3): fetch_data() except: break

Solution

  1. Step 1: Check syntax for retry loop

    The code uses a for loop to try 3 times, with try-except to catch errors and break if successful.
  2. Step 2: Identify correct syntax

    for _ in range(3): try: fetch_data() break except Exception: pass matches the correct Python syntax with try-except inside the loop and break on success.
  3. Final Answer:

    for _ in range(3): try: fetch_data() break except Exception: pass -> Option A
  4. Quick Check:

    Correct retry loop syntax = for _ in range(3): try: fetch_data() break except Exception: pass [OK]
Hint: Look for try-except inside a for loop with break [OK]
Common Mistakes:
  • Missing try-except block
  • Incorrect loop syntax
  • Using 'except' without 'try'
3.

Consider this code snippet implementing retry and fallback logic:

def get_data():
    for _ in range(2):
        try:
            return fetch_from_primary()
        except Exception:
            pass
    return fetch_from_backup()

If fetch_from_primary() fails both times, what will get_data() return?

medium
A. The result of fetch_from_primary()
B. The result of fetch_from_backup()
C. None
D. An exception is raised

Solution

  1. Step 1: Analyze retry attempts

    The function tries fetch_from_primary() twice inside the loop, catching exceptions and continuing if it fails.
  2. Step 2: Understand fallback behavior

    If both retries fail, the function calls and returns fetch_from_backup() as a fallback.
  3. Final Answer:

    The result of fetch_from_backup() -> Option B
  4. Quick Check:

    Retries fail -> fallback used = The result of fetch_from_backup() [OK]
Hint: If retries fail, fallback result is returned [OK]
Common Mistakes:
  • Assuming primary always returns result
  • Ignoring fallback call
  • Thinking exception propagates
4.

Identify the bug in this retry and fallback code snippet:

def get_info():
    for i in range(3):
        try:
            return fetch_data()
        except:
            continue
    return fallback_data()
medium
A. The except block catches all exceptions without specifying type
B. The function returns fallback_data() even if fetch_data() succeeds
C. The except block should raise the exception instead of continue
D. The loop variable i is unused and should be removed

Solution

  1. Step 1: Review exception handling

    The except block catches all exceptions without specifying the exception type, which is bad practice and can hide bugs.
  2. Step 2: Identify best practice

    It's better to catch specific exceptions to avoid masking unexpected errors.
  3. Final Answer:

    The except block catches all exceptions without specifying type -> Option A
  4. Quick Check:

    Catch specific exceptions, not all [OK]
Hint: Avoid bare except; specify exception type [OK]
Common Mistakes:
  • Using bare except blocks
  • Ignoring exception types
  • Assuming unused variables cause bugs
5.

You want to design an AI agent that tries to fetch user data from a primary server up to 3 times. If all retries fail, it should fetch from a backup server. Which code snippet correctly implements this retry and fallback logic?

Option A:
for _ in range(3):
    try:
        data = fetch_primary()
    except:
        data = fetch_backup()
        break
Option B:
for _ in range(3):
    try:
        data = fetch_primary()
        break
    except:
        pass
else:
    data = fetch_backup()
Option C:
try:
    data = fetch_primary()
except:
    data = fetch_backup()
Option D:
while True:
    try:
        data = fetch_primary()
        break
    except:
        data = fetch_backup()
        break
hard
A. Retries primary, but fallback runs immediately on first failure
B. No retries, fallback runs immediately on first failure
C. Retries primary 3 times, then fallback if all fail
D. Retries once, fallback runs immediately after first failure

Solution

  1. Step 1: Understand retry and fallback requirements

    The agent must retry fetching from primary 3 times, then fallback only if all retries fail.
  2. Step 2: Analyze each option's behavior

    Retries primary 3 times, then fallback if all fail uses a for loop with try-except and an else clause that runs fallback only if loop completes without break (all retries failed). This matches requirements.
  3. Final Answer:

    Retries primary 3 times, then fallback if all fail -> Option C
  4. Quick Check:

    Retry 3 times + fallback after = Retries primary 3 times, then fallback if all fail [OK]
Hint: Use for-else to run fallback after retries fail [OK]
Common Mistakes:
  • Running fallback too early
  • Not retrying enough times
  • Missing else clause for fallback