Bird
Raised Fist0
Agentic AIml~8 mins

Error handling in tool calls in Agentic AI - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Error handling in tool calls
Which metric matters for this concept and WHY

When handling errors in tool calls, the key metric is robustness. This means how well the system continues to work correctly even when some tools fail or give wrong results. We also look at error rate (how often errors happen) and recovery rate (how often the system fixes or handles errors successfully). These metrics matter because they show if the AI can keep helping users without crashing or giving wrong answers.

Confusion matrix or equivalent visualization (ASCII)
    Tool Call Outcome Confusion Matrix:

                 | Tool Success | Tool Failure |
    -------------|--------------|--------------|
    Handled Well |      TN      |      TP      |
    Not Handled  |      FP      |      FN      |

    Explanation:
    - TP: Tool failed but system handled it correctly (good error handling).
    - FP: Tool worked but system handled it as an error (false alarm).
    - FN: Tool failed but system failed to handle (bad error handling).
    - TN: Tool worked and system did not handle it (correctly proceeded).
    
Precision vs Recall tradeoff with concrete examples

Precision here means: When the system says it handled an error, how often was it correct?

Recall means: Out of all actual tool failures, how many did the system handle?

Example: If the system tries to fix every tool failure (high recall) but sometimes thinks there is an error when there is none (low precision), it may waste time fixing non-errors.

On the other hand, if it only fixes errors it is very sure about (high precision) but misses many real errors (low recall), users may see failures.

Good error handling balances precision and recall to fix most real errors without false alarms.

What "good" vs "bad" metric values look like for this use case
  • Good: Precision and recall both above 90%. The system catches most errors and rarely raises false alarms.
  • Bad: Precision below 50% means many false error fixes, confusing users. Recall below 50% means many errors go unhandled, causing failures.
  • Error rate: Should be low, but some errors are normal. The key is how well the system recovers.
  • Recovery rate: High recovery rate (above 85%) means the system fixes most errors it detects.
Metrics pitfalls
  • Ignoring error types: Not all errors are equal. Some cause big failures, others minor delays. Metrics should reflect impact.
  • Overfitting to test errors: If the system only learns to handle known errors, it may fail on new ones.
  • Data leakage: Testing error handling on data the system already saw can give false high scores.
  • Accuracy paradox: High overall accuracy can hide poor error handling if errors are rare.
Self-check question

Your system has 98% accuracy but only 12% recall on tool failures. Is it good for production? Why not?

Answer: No, it is not good. The high accuracy is misleading because tool failures are rare. The low recall means the system misses 88% of real errors, so many failures go unhandled, hurting user experience.

Key Result
Robust error handling balances high precision and recall to catch and fix most tool failures without false alarms.

Practice

(1/5)
1. What is the main purpose of using try-except blocks when calling external tools in an AI agent?
easy
A. To make the tool run in parallel
B. To speed up the tool's execution
C. To increase the tool's accuracy
D. To catch errors and prevent the program from crashing

Solution

  1. Step 1: Understand the role of try-except blocks

    Try-except blocks are used to catch errors that happen during code execution, especially when calling external tools that might fail.
  2. Step 2: Identify the benefit in AI agent tool calls

    By catching errors, the program avoids crashing and can handle failures gracefully, improving reliability.
  3. Final Answer:

    To catch errors and prevent the program from crashing -> Option D
  4. Quick Check:

    Error catching = Prevent crash [OK]
Hint: Try-except blocks catch errors to keep programs running [OK]
Common Mistakes:
  • Thinking try-except speeds up code
  • Confusing error handling with improving accuracy
  • Assuming try-except runs code in parallel
2. Which of the following is the correct syntax to catch a general error when calling a tool in Python?
easy
A. try: tool_call() error: handle_error()
B. try: tool_call() catch: handle_error()
C. try: tool_call() except: handle_error()
D. try: tool_call() fail: handle_error()

Solution

  1. Step 1: Recall Python error handling syntax

    Python uses try and except blocks to catch errors.
  2. Step 2: Match the correct keywords

    The correct keywords are try and except, not catch, error, or fail.
  3. Final Answer:

    try: tool_call() except: handle_error() -> Option C
  4. Quick Check:

    Python uses except, not catch [OK]
Hint: Remember Python uses except, not catch, for errors [OK]
Common Mistakes:
  • Using catch instead of except
  • Using error or fail as keywords
  • Missing indentation in try-except blocks
3. Consider this code snippet:
try:
  result = tool_call('data')
except Exception:
  result = 'Fallback result'
print(result)
If tool_call raises an error, what will be printed?
medium
A. 'Fallback result'
B. None
C. Nothing, program crashes
D. The error message from tool_call

Solution

  1. Step 1: Analyze the try-except behavior

    If tool_call raises an error, the except block runs and sets result to 'Fallback result'.
  2. Step 2: Understand the print output

    After the except block, print(result) prints the fallback string.
  3. Final Answer:

    'Fallback result' -> Option A
  4. Quick Check:

    Error caught = fallback printed [OK]
Hint: If error caught, except block runs fallback code [OK]
Common Mistakes:
  • Assuming error message prints automatically
  • Thinking program crashes despite except
  • Expecting None instead of fallback
4. This code tries to call a tool and handle errors:
try:
  output = tool_call()
except Exception as e
  print('Error:', e)
  output = None
print(output)
What is the error in this code?
medium
A. Using print inside except block is not allowed
B. Missing colon after except Exception as e
C. output should be set before try block
D. tool_call() must be inside a function

Solution

  1. Step 1: Check except syntax

    The except line is missing a colon at the end, which is required in Python syntax.
  2. Step 2: Verify other parts

    Printing inside except is allowed, output can be set there, and tool_call can be called anywhere.
  3. Final Answer:

    Missing colon after except Exception as e -> Option B
  4. Quick Check:

    Except line needs colon [OK]
Hint: Except lines always end with a colon ':' [OK]
Common Mistakes:
  • Forgetting colon after except
  • Thinking print is disallowed in except
  • Assuming output must be pre-set
5. You want your AI agent to call two tools in sequence. If the first tool fails, it should use a fallback result and still call the second tool. Which code correctly handles this?
hard
A. try: result1 = tool1() except Exception: result1 = 'fallback' result2 = tool2() print(result1, result2)
B. result1 = tool1() result2 = tool2() if not result1: result1 = 'fallback' print(result1, result2)
C. try: result1 = tool1() result2 = tool2() except Exception: result1 = 'fallback' result2 = 'fallback2' print(result1, result2)
D. try: result1 = tool1() except Exception: result1 = 'fallback' result2 = tool2() print(result1, result2)

Solution

  1. Step 1: Understand the requirement

    If the first tool fails, use fallback for result1 but still call tool2 normally.
  2. Step 2: Analyze each option

    try: result1 = tool1() except Exception: result1 = 'fallback' result2 = tool2() print(result1, result2) tries tool1, catches error to set fallback, then calls tool2 outside except, so tool2 always runs. result1 = tool1() result2 = tool2() if not result1: result1 = 'fallback' print(result1, result2) does not catch exceptions, so failure crashes. try: result1 = tool1() result2 = tool2() except Exception: result1 = 'fallback' result2 = 'fallback2' print(result1, result2) calls both tools inside try, so if tool2 fails, both fallback. try: result1 = tool1() except Exception: result1 = 'fallback' result2 = tool2() print(result1, result2) calls tool2 inside except, so tool2 runs only if tool1 fails.
  3. Final Answer:

    Option A correctly handles fallback and always calls second tool -> Option A
  4. Quick Check:

    Separate try-except for first tool, call second after [OK]
Hint: Put second tool call outside except to always run it [OK]
Common Mistakes:
  • Calling second tool only inside except block
  • Not catching exceptions for first tool
  • Putting both calls inside one try-except