When handling errors in tool calls, the key metric is robustness. This means how well the system continues to work correctly even when some tools fail or give wrong results. We also look at error rate (how often errors happen) and recovery rate (how often the system fixes or handles errors successfully). These metrics matter because they show if the AI can keep helping users without crashing or giving wrong answers.
Error handling in tool calls in Agentic AI - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Tool Call Outcome Confusion Matrix:
| Tool Success | Tool Failure |
-------------|--------------|--------------|
Handled Well | TN | TP |
Not Handled | FP | FN |
Explanation:
- TP: Tool failed but system handled it correctly (good error handling).
- FP: Tool worked but system handled it as an error (false alarm).
- FN: Tool failed but system failed to handle (bad error handling).
- TN: Tool worked and system did not handle it (correctly proceeded).
Precision here means: When the system says it handled an error, how often was it correct?
Recall means: Out of all actual tool failures, how many did the system handle?
Example: If the system tries to fix every tool failure (high recall) but sometimes thinks there is an error when there is none (low precision), it may waste time fixing non-errors.
On the other hand, if it only fixes errors it is very sure about (high precision) but misses many real errors (low recall), users may see failures.
Good error handling balances precision and recall to fix most real errors without false alarms.
- Good: Precision and recall both above 90%. The system catches most errors and rarely raises false alarms.
- Bad: Precision below 50% means many false error fixes, confusing users. Recall below 50% means many errors go unhandled, causing failures.
- Error rate: Should be low, but some errors are normal. The key is how well the system recovers.
- Recovery rate: High recovery rate (above 85%) means the system fixes most errors it detects.
- Ignoring error types: Not all errors are equal. Some cause big failures, others minor delays. Metrics should reflect impact.
- Overfitting to test errors: If the system only learns to handle known errors, it may fail on new ones.
- Data leakage: Testing error handling on data the system already saw can give false high scores.
- Accuracy paradox: High overall accuracy can hide poor error handling if errors are rare.
Your system has 98% accuracy but only 12% recall on tool failures. Is it good for production? Why not?
Answer: No, it is not good. The high accuracy is misleading because tool failures are rare. The low recall means the system misses 88% of real errors, so many failures go unhandled, hurting user experience.
Practice
try-except blocks when calling external tools in an AI agent?Solution
Step 1: Understand the role of try-except blocks
Try-except blocks are used to catch errors that happen during code execution, especially when calling external tools that might fail.Step 2: Identify the benefit in AI agent tool calls
By catching errors, the program avoids crashing and can handle failures gracefully, improving reliability.Final Answer:
To catch errors and prevent the program from crashing -> Option DQuick Check:
Error catching = Prevent crash [OK]
- Thinking try-except speeds up code
- Confusing error handling with improving accuracy
- Assuming try-except runs code in parallel
Solution
Step 1: Recall Python error handling syntax
Python usestryandexceptblocks to catch errors.Step 2: Match the correct keywords
The correct keywords aretryandexcept, notcatch,error, orfail.Final Answer:
try: tool_call() except: handle_error() -> Option CQuick Check:
Python uses except, not catch [OK]
- Using catch instead of except
- Using error or fail as keywords
- Missing indentation in try-except blocks
try:
result = tool_call('data')
except Exception:
result = 'Fallback result'
print(result)
If tool_call raises an error, what will be printed?Solution
Step 1: Analyze the try-except behavior
Iftool_callraises an error, the except block runs and setsresultto 'Fallback result'.Step 2: Understand the print output
After the except block,print(result)prints the fallback string.Final Answer:
'Fallback result' -> Option AQuick Check:
Error caught = fallback printed [OK]
- Assuming error message prints automatically
- Thinking program crashes despite except
- Expecting None instead of fallback
try:
output = tool_call()
except Exception as e
print('Error:', e)
output = None
print(output)
What is the error in this code?Solution
Step 1: Check except syntax
The except line is missing a colon at the end, which is required in Python syntax.Step 2: Verify other parts
Printing inside except is allowed, output can be set there, and tool_call can be called anywhere.Final Answer:
Missing colon after except Exception as e -> Option BQuick Check:
Except line needs colon [OK]
- Forgetting colon after except
- Thinking print is disallowed in except
- Assuming output must be pre-set
Solution
Step 1: Understand the requirement
If the first tool fails, use fallback for result1 but still call tool2 normally.Step 2: Analyze each option
try: result1 = tool1() except Exception: result1 = 'fallback' result2 = tool2() print(result1, result2) tries tool1, catches error to set fallback, then calls tool2 outside except, so tool2 always runs. result1 = tool1() result2 = tool2() if not result1: result1 = 'fallback' print(result1, result2) does not catch exceptions, so failure crashes. try: result1 = tool1() result2 = tool2() except Exception: result1 = 'fallback' result2 = 'fallback2' print(result1, result2) calls both tools inside try, so if tool2 fails, both fallback. try: result1 = tool1() except Exception: result1 = 'fallback' result2 = tool2() print(result1, result2) calls tool2 inside except, so tool2 runs only if tool1 fails.Final Answer:
Option A correctly handles fallback and always calls second tool -> Option AQuick Check:
Separate try-except for first tool, call second after [OK]
- Calling second tool only inside except block
- Not catching exceptions for first tool
- Putting both calls inside one try-except
