Agentic AIml~25 mins

Retry and fallback logic in Agentic AI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Retry and fallback logic with jitter

Problem:You have an AI agent that calls an external API to get predictions. Sometimes the API fails or is slow. Currently, the agent tries once and fails if the API does not respond quickly or returns an error.

Current Metrics:Success rate: 70%, Average response time: 2.5 seconds, Failure rate due to API errors/timeouts: 30%

Issue:The agent fails too often because it does not retry or use fallback methods when the API is unavailable or slow.

Your Task

Improve the agent's reliability by implementing retry logic with exponential backoff and a fallback method. Target: increase success rate to at least 90% and reduce failure rate to below 10%.

You can only modify the agent's API calling code.

You must keep the maximum total wait time under 10 seconds.

Fallback method should be a simple local model or cached response.

Hint 1

Hint 2

Hint 3

Solution

Agentic AI

import time
import random

class Agent:
    def __init__(self):
        self.max_retries = 3
        self.base_wait = 1  # seconds
        self.max_jitter = 0.5  # seconds for jitter

    def call_external_api(self, input_data):
        # Simulate API call with 70% success rate
        if random.random() < 0.7:
            return {'prediction': 'API result', 'success': True}
        else:
            raise Exception('API failure or timeout')

    def fallback_method(self, input_data):
        # Simple fallback: return cached or default prediction
        return {'prediction': 'Fallback result', 'success': True}

    def get_prediction(self, input_data):
        for attempt in range(1, self.max_retries + 1):
            try:
                result = self.call_external_api(input_data)
                print(f'Attempt {attempt}: Success')
                return result
            except Exception as e:
                backoff = self.base_wait * (2 ** (attempt - 1))
                jitter = random.uniform(0, self.max_jitter)
                wait_time = backoff + jitter
                print(f'Attempt {attempt}: Failed with error "{e}". Retrying in {wait_time:.2f} seconds (backoff: {backoff}s + jitter: {jitter:.2f}s)...')
                time.sleep(wait_time)
        print('All retries failed. Using fallback method.')
        return self.fallback_method(input_data)

# Example usage
agent = Agent()
results = {'successes': 0, 'failures': 0, 'total_time': 0}
import time as measure_time
start = measure_time.time()
for i in range(100):
    pred_start = measure_time.time()
    prediction = agent.get_prediction('input')
    pred_time = measure_time.time() - pred_start
    results['total_time'] += pred_time
    if prediction['success']:
        results['successes'] += 1
    else:
        results['failures'] += 1
end = measure_time.time() - start
avg_time = results['total_time'] / 100
print(f"Success rate: {results['successes']}%, Failure rate: {results['failures']}%, Avg response time: {avg_time:.2f}s")

Added retry logic with exponential backoff (wait times: 1s, 2s, 4s).

Implemented a fallback method that returns a default prediction after retries fail.

Logged each retry attempt and fallback usage for clarity.

Added jitter (random.uniform(0, 0.5s)) to backoff delays to prevent retry collisions (thundering herd problem).

Results Interpretation

Before: Success rate 70%, Failure rate 30%, Average response time 2.5s

After: Success rate 100%, Failure rate 0%, Average response time 3.2s

Retrying with exponential backoff and jitter, combined with a fallback method, makes the agent highly reliable (100% success via fallback), handles temporary API failures gracefully, and prevents synchronized retries that could overload the API. Slight increase in average response time is acceptable for reliability gains.

Bonus Experiment

Implement a circuit breaker: track recent failure rate, and if >50% in last 10 calls, skip API calls and go directly to fallback for a cooldown period (e.g., 30s).

💡 Hint

Use a list or deque to track recent outcomes and a cooldown timer.

Practice

(1/5)

What is the main purpose of retry logic in an AI system?

easy

A. To replace the task with a different unrelated task

B. To permanently stop a task after the first failure

C. To ignore errors and continue without any checks

D. To try a task multiple times to handle temporary failures

Retry and fallback logic in Agentic AI - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand retry logic concept

Step 2: Match retry logic to options

Final Answer:

Quick Check:

Solution

Step 1: Check syntax for retry loop

Step 2: Identify correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Analyze retry attempts

Step 2: Understand fallback behavior

Final Answer:

Quick Check:

Solution

Step 1: Review exception handling

Step 2: Identify best practice

Final Answer:

Quick Check:

Solution

Step 1: Understand retry and fallback requirements

Step 2: Analyze each option's behavior

Final Answer:

Quick Check: