Bird
Raised Fist0
Agentic AIml~8 mins

Tool selection by the agent in Agentic AI - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Tool selection by the agent
Which metric matters for tool selection by the agent and WHY

When an agent chooses tools to solve tasks, the key metric is accuracy of the tool's output. This shows how often the chosen tool gives the right answer.

Additionally, latency (speed) matters because slow tools can delay the agent's response.

For some tasks, precision and recall matter if the tool filters or detects specific items. For example, if the agent picks a spam detection tool, high precision avoids false spam labels.

Overall, the agent should select tools that balance accuracy, speed, and task-specific metrics to perform well.

Confusion matrix example for tool output
      | Predicted Positive | Predicted Negative |
      |--------------------|--------------------|
      | True Positive (TP)  | False Negative (FN) |
      | False Positive (FP) | True Negative (TN)  |

      Example:
      TP = 80, FP = 20, FN = 10, TN = 90
      Total samples = 200
    

From this, the agent can calculate:

  • Precision = TP / (TP + FP) = 80 / (80 + 20) = 0.8
  • Recall = TP / (TP + FN) = 80 / (80 + 10) = 0.89
  • Accuracy = (TP + TN) / Total = (80 + 90) / 200 = 0.85
Precision vs Recall tradeoff in tool selection

Imagine the agent must pick a tool to detect fraud:

  • High precision means the tool rarely flags good transactions as fraud (few false alarms).
  • High recall means the tool catches most fraud cases (few missed frauds).

If the agent picks a tool with high precision but low recall, it misses many frauds.

If it picks a tool with high recall but low precision, many good transactions are wrongly flagged.

The agent must balance these based on the task's cost of errors.

Good vs Bad metric values for tool selection

Good metrics:

  • Accuracy above 90% for general tasks
  • Precision and recall above 85% for detection tasks
  • Low latency (fast response)

Bad metrics:

  • Accuracy below 70% means many wrong answers
  • Precision or recall below 50% means poor detection or many false alarms
  • High latency causing slow agent responses
Common pitfalls in evaluating tool selection metrics
  • Accuracy paradox: High accuracy can be misleading if data is imbalanced (e.g., many negatives, few positives).
  • Data leakage: If the tool was tested on data it already saw, metrics are too optimistic.
  • Overfitting: Tool performs well on training data but poorly on new data.
  • Ignoring latency: A very accurate but slow tool may hurt overall agent performance.
Self-check question

Your agent selects a tool with 98% accuracy but only 12% recall on fraud detection. Is this good for production? Why or why not?

Answer: No, it is not good. Although accuracy is high, the tool misses 88% of fraud cases (low recall). This means many frauds go undetected, which is risky. The agent should pick a tool with higher recall to catch more fraud.

Key Result
Tool selection metrics focus on accuracy, precision, recall, and latency to ensure the agent chooses effective and timely tools.

Practice

(1/5)
1. What is the main purpose of tool selection by an AI agent?
easy
A. To choose the best helper tool for a specific task
B. To train the AI model faster
C. To store data securely
D. To improve the hardware performance

Solution

  1. Step 1: Understand the role of tool selection

    Tool selection means picking the right tool or helper for the AI to complete a task well.
  2. Step 2: Match purpose with options

    Among the options, only choosing the best helper tool fits the idea of tool selection.
  3. Final Answer:

    To choose the best helper tool for a specific task -> Option A
  4. Quick Check:

    Tool selection = choosing best tool [OK]
Hint: Tool selection means picking the right tool for the job [OK]
Common Mistakes:
  • Confusing tool selection with training the model
  • Thinking it improves hardware
  • Mixing it with data storage
2. Which of the following is the correct way to check if a tool named calculator should be used for a task containing the word 'math'?
easy
A. if calculator in task_description: use math
B. if task_description == 'calculator': use math
C. if 'math' in task_description: use calculator
D. if 'calculator' == task_description: use math

Solution

  1. Step 1: Understand keyword check syntax

    To check if 'math' is in the task description, use 'math' in task_description.
  2. Step 2: Match correct syntax with options

    if 'math' in task_description: use calculator correctly uses this syntax to decide to use the calculator tool.
  3. Final Answer:

    if 'math' in task_description: use calculator -> Option C
  4. Quick Check:

    Keyword in string check = if 'math' in task_description: use calculator [OK]
Hint: Use 'keyword' in string to check presence [OK]
Common Mistakes:
  • Using equality instead of containment
  • Checking wrong variable names
  • Confusing tool and task names
3. Given the code below, what will be printed?
tools = ['calculator', 'translator', 'weather']
task = 'translate this sentence'
selected_tool = None
for tool in tools:
    if tool in task:
        selected_tool = tool
        break
print(selected_tool)
medium
A. calculator
B. translator
C. weather
D. None

Solution

  1. Step 1: Loop through tools and check if tool name is in task

    The loop checks if 'calculator', 'translator', or 'weather' is in the task string 'translate this sentence'.
  2. Step 2: Identify which tool matches the task

    None of the tool names appear as substrings in the task ('translator' is not in 'translate this sentence'), so selected_tool remains None.
  3. Final Answer:

    None -> Option D
  4. Quick Check:

    No substring match, prints None [OK]
Hint: Check substring presence carefully in loops [OK]
Common Mistakes:
  • Assuming exact word match needed
  • Ignoring break statement
  • Thinking 'translator' matches 'translate'
4. The following code is meant to select a tool based on keywords, but it always prints None. What is the error?
tools = {'calc': 'calculator', 'trans': 'translator'}
task = 'please compute this'
selected_tool = None
for key in tools:
    if key in task:
        selected_tool = tools[key]
print(selected_tool)
medium
A. The print statement is outside the loop
B. The keys 'calc' and 'trans' do not match any substring in the task
C. The loop should use tools.values() instead of keys
D. The dictionary keys should be full tool names, not abbreviations

Solution

  1. Step 1: Check if keys are substrings of the task

    The keys are 'calc' and 'trans'. The task is 'please compute this'. Neither 'calc' nor 'trans' is a substring.
  2. Step 2: Understand why selected_tool stays None

    Since no key matches, the if condition never runs, so selected_tool remains None.
  3. Final Answer:

    The keys 'calc' and 'trans' do not match any substring in the task -> Option B
  4. Quick Check:

    Substring match fails due to partial keys [OK]
Hint: Check if keys exactly appear as substrings in task [OK]
Common Mistakes:
  • Assuming partial matches work
  • Thinking print location causes error
  • Confusing keys with values
5. You want an AI agent to select tools based on multiple keywords in a task. Which approach best fits this need?
hard
A. Check each tool's keywords and select the tool with the most keyword matches
B. Always select the first tool in the list regardless of task
C. Select a tool randomly without checking the task
D. Use only one keyword per tool and ignore others

Solution

  1. Step 1: Understand the need for multiple keyword matching

    When tasks have multiple keywords, the agent should consider all keywords to pick the best tool.
  2. Step 2: Evaluate options for best fit

    Check each tool's keywords and select the tool with the most keyword matches uses multiple keyword checks and picks the tool with the most matches, which is the best approach.
  3. Final Answer:

    Check each tool's keywords and select the tool with the most keyword matches -> Option A
  4. Quick Check:

    Best tool = most keyword matches [OK]
Hint: Count keyword matches to pick best tool [OK]
Common Mistakes:
  • Ignoring multiple keywords
  • Picking tools randomly
  • Always choosing first tool