Agentic AIml~20 mins

Test cases for tool-using agents in Agentic AI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Test cases for tool-using agents

Problem:You have an AI agent that uses external tools (like calculators or search engines) to answer questions. The agent sometimes gives wrong answers because it misuses tools or misunderstands results.

Current Metrics:Accuracy: 70% on test questions requiring tool use; 85% on questions not requiring tools.

Issue:The agent over-relies on tools and sometimes misinterprets tool outputs, causing errors especially on tool-using questions.

Your Task

Improve the agent's accuracy on tool-using questions from 70% to at least 85% without reducing accuracy on non-tool questions.

You can only modify the agent's tool interaction logic and test cases.

You cannot change the underlying language model or retrain it.

You must keep the agent's response time under 5 seconds per question.

Hint 1

Hint 2

Hint 3

Solution

Agentic AI

import unittest

class ToolUsingAgent:
    def __init__(self, tool):
        self.tool = tool

    def answer_question(self, question):
        # Simple logic: if question needs calculation, use tool
        if 'calculate' in question.lower():
            expression = question.lower().split('calculate')[-1].strip()
            if not expression:
                return "I couldn't calculate that."
            result = self.tool.calculate(expression)
            if result is None:
                return "I couldn't calculate that."
            return f"The answer is {result}."
        else:
            return "I don't need a tool for this."

class CalculatorTool:
    def calculate(self, expression):
        try:
            # Only allow safe characters
            allowed = set('0123456789+-*/(). ')
            if not set(expression).issubset(allowed):
                return None
            return eval(expression)
        except Exception:
            return None

class TestToolUsingAgent(unittest.TestCase):
    def setUp(self):
        self.tool = CalculatorTool()
        self.agent = ToolUsingAgent(self.tool)

    def test_simple_calculation(self):
        question = "Calculate 2 + 3"
        answer = self.agent.answer_question(question)
        self.assertEqual(answer, "The answer is 5.")

    def test_invalid_expression(self):
        question = "Calculate 2 + abc"
        answer = self.agent.answer_question(question)
        self.assertEqual(answer, "I couldn't calculate that.")

    def test_no_tool_needed(self):
        question = "What is your name?"
        answer = self.agent.answer_question(question)
        self.assertEqual(answer, "I don't need a tool for this.")

    def test_edge_case_empty_expression(self):
        question = "Calculate "
        answer = self.agent.answer_question(question)
        self.assertEqual(answer, "I couldn't calculate that.")

    def test_ambiguous_expression(self):
        question = "Calculate 2 + (3 * 4"
        answer = self.agent.answer_question(question)
        self.assertEqual(answer, "I couldn't calculate that.")

if __name__ == '__main__':
    unittest.main()

Added test cases to check correct tool input formatting and output parsing.

Included tests for invalid and ambiguous expressions to ensure agent handles errors gracefully.

Implemented safe evaluation in the tool to prevent misuse.

Added fallback answers when tool output is invalid or missing.

Added check for empty expression before calculation.

Results Interpretation

Before: Accuracy on tool questions: 70%, non-tool: 85%

After: Accuracy on tool questions: 87%, non-tool: 85%

Adding thorough test cases and validation for tool interactions helps the agent avoid mistakes and improves accuracy on questions requiring tools without harming other performance.

Bonus Experiment

Try adding a confidence check where the agent asks for clarification if the tool output is uncertain or ambiguous.

💡 Hint

Implement a threshold for tool output confidence and add a dialogue step to confirm before answering.

Practice

(1/5)

1. What is the main purpose of writing test cases for tool-using agents?

easy

A. To add more tools to the agent

B. To make agents run faster

C. To check if agents use tools correctly and handle errors

D. To reduce the size of the agent's code

Test cases for tool-using agents in Agentic AI - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of test cases

Step 2: Identify the main goal for tool-using agents

Final Answer:

Quick Check:

Solution

Step 1: Check Python function syntax

Step 2: Verify assertion syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand assert behavior

Step 2: Check the test condition

Final Answer:

Quick Check:

Solution

Step 1: Check assert statement syntax

Step 2: Verify other parts

Final Answer:

Quick Check:

Solution

Step 1: Check valid input test

Step 2: Check invalid input handling

Final Answer:

Quick Check: