Bird
Raised Fist0
Agentic AIml~20 mins

Test cases for tool-using agents in Agentic AI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Test cases for tool-using agents
Problem:You have an AI agent that uses external tools (like calculators or search engines) to answer questions. The agent sometimes gives wrong answers because it misuses tools or misunderstands results.
Current Metrics:Accuracy: 70% on test questions requiring tool use; 85% on questions not requiring tools.
Issue:The agent over-relies on tools and sometimes misinterprets tool outputs, causing errors especially on tool-using questions.
Your Task
Improve the agent's accuracy on tool-using questions from 70% to at least 85% without reducing accuracy on non-tool questions.
You can only modify the agent's tool interaction logic and test cases.
You cannot change the underlying language model or retrain it.
You must keep the agent's response time under 5 seconds per question.
Hint 1
Hint 2
Hint 3
Solution
Agentic AI
import unittest

class ToolUsingAgent:
    def __init__(self, tool):
        self.tool = tool

    def answer_question(self, question):
        # Simple logic: if question needs calculation, use tool
        if 'calculate' in question.lower():
            expression = question.lower().split('calculate')[-1].strip()
            if not expression:
                return "I couldn't calculate that."
            result = self.tool.calculate(expression)
            if result is None:
                return "I couldn't calculate that."
            return f"The answer is {result}."
        else:
            return "I don't need a tool for this."

class CalculatorTool:
    def calculate(self, expression):
        try:
            # Only allow safe characters
            allowed = set('0123456789+-*/(). ')
            if not set(expression).issubset(allowed):
                return None
            return eval(expression)
        except Exception:
            return None

class TestToolUsingAgent(unittest.TestCase):
    def setUp(self):
        self.tool = CalculatorTool()
        self.agent = ToolUsingAgent(self.tool)

    def test_simple_calculation(self):
        question = "Calculate 2 + 3"
        answer = self.agent.answer_question(question)
        self.assertEqual(answer, "The answer is 5.")

    def test_invalid_expression(self):
        question = "Calculate 2 + abc"
        answer = self.agent.answer_question(question)
        self.assertEqual(answer, "I couldn't calculate that.")

    def test_no_tool_needed(self):
        question = "What is your name?"
        answer = self.agent.answer_question(question)
        self.assertEqual(answer, "I don't need a tool for this.")

    def test_edge_case_empty_expression(self):
        question = "Calculate "
        answer = self.agent.answer_question(question)
        self.assertEqual(answer, "I couldn't calculate that.")

    def test_ambiguous_expression(self):
        question = "Calculate 2 + (3 * 4"
        answer = self.agent.answer_question(question)
        self.assertEqual(answer, "I couldn't calculate that.")

if __name__ == '__main__':
    unittest.main()
Added test cases to check correct tool input formatting and output parsing.
Included tests for invalid and ambiguous expressions to ensure agent handles errors gracefully.
Implemented safe evaluation in the tool to prevent misuse.
Added fallback answers when tool output is invalid or missing.
Added check for empty expression before calculation.
Results Interpretation

Before: Accuracy on tool questions: 70%, non-tool: 85%

After: Accuracy on tool questions: 87%, non-tool: 85%

Adding thorough test cases and validation for tool interactions helps the agent avoid mistakes and improves accuracy on questions requiring tools without harming other performance.
Bonus Experiment
Try adding a confidence check where the agent asks for clarification if the tool output is uncertain or ambiguous.
💡 Hint
Implement a threshold for tool output confidence and add a dialogue step to confirm before answering.

Practice

(1/5)
1. What is the main purpose of writing test cases for tool-using agents?
easy
A. To add more tools to the agent
B. To make agents run faster
C. To check if agents use tools correctly and handle errors
D. To reduce the size of the agent's code

Solution

  1. Step 1: Understand the role of test cases

    Test cases are designed to verify that the agent behaves as expected, especially when using tools.
  2. Step 2: Identify the main goal for tool-using agents

    For agents that use tools, tests ensure they use these tools correctly and handle any errors gracefully.
  3. Final Answer:

    To check if agents use tools correctly and handle errors -> Option C
  4. Quick Check:

    Test cases purpose = check tool use and errors [OK]
Hint: Test cases verify correct tool use and error handling [OK]
Common Mistakes:
  • Thinking test cases speed up agents
  • Believing test cases reduce code size
  • Assuming test cases add tools
2. Which of the following is the correct way to write a test case for a tool-using agent in Python?
easy
A. test agent tool: assert agent.use_tool('calculator', '2+2') == 4
B. def test_agent_tool(): assert agent.use_tool('calculator', '2+2') == 4
C. def test_agent_tool: assert agent.use_tool('calculator', '2+2') == 4
D. def test_agent_tool() assert agent.use_tool('calculator', '2+2') == 4

Solution

  1. Step 1: Check Python function syntax

    Python test functions start with 'def', have parentheses, and a colon at the end.
  2. Step 2: Verify assertion syntax

    The assert statement must be inside the function and correctly compare expected output.
  3. Final Answer:

    def test_agent_tool(): assert agent.use_tool('calculator', '2+2') == 4 -> Option B
  4. Quick Check:

    Correct Python test function syntax = def test_agent_tool(): assert agent.use_tool('calculator', '2+2') == 4 [OK]
Hint: Remember Python functions need parentheses and colon [OK]
Common Mistakes:
  • Omitting parentheses in function definition
  • Missing colon after function header
  • Incorrect assert statement placement
3. Given this test case code snippet, what will be the output if the agent returns 5 instead of 4?
def test_agent_tool():
    result = agent.use_tool('calculator', '2+2')
    assert result == 4
    print('Test passed')
medium
A. Test passed
B. SyntaxError
C. No output
D. AssertionError

Solution

  1. Step 1: Understand assert behavior

    If the assert condition is false, Python raises an AssertionError and stops execution.
  2. Step 2: Check the test condition

    The test expects result == 4, but agent returns 5, so assert fails.
  3. Final Answer:

    AssertionError -> Option D
  4. Quick Check:

    Assert fails if values differ = AssertionError [OK]
Hint: Assert fails if expected and actual differ [OK]
Common Mistakes:
  • Thinking print runs after failed assert
  • Confusing AssertionError with SyntaxError
  • Assuming no output on failure
4. Identify the error in this test case for a tool-using agent:
def test_agent_tool():
    result = agent.use_tool('search', 'weather today')
    assert result = 'sunny'
    print('Test passed')
medium
A. Using '=' instead of '==' in assert
B. Missing parentheses in print
C. Wrong function name
D. Agent tool name is invalid

Solution

  1. Step 1: Check assert statement syntax

    In Python, '=' is for assignment, '==' is for comparison. Assert needs '==' to compare values.
  2. Step 2: Verify other parts

    Print has parentheses, function name is valid, and tool name is plausible.
  3. Final Answer:

    Using '=' instead of '==' in assert -> Option A
  4. Quick Check:

    Assert needs '==' for comparison [OK]
Hint: Assert compares with '==', not '=' [OK]
Common Mistakes:
  • Confusing assignment '=' with comparison '=='
  • Ignoring syntax errors in assert
  • Assuming print needs no parentheses
5. You want to test an agent that uses a calculator tool to handle multiple expressions. Which test case best checks if the agent correctly handles both valid and invalid inputs?
hard
A. def test_calc(): assert agent.use_tool('calculator', '3*3') == 9; assert agent.use_tool('calculator', 'abc') == 'error'
B. def test_calc(): assert agent.use_tool('calculator', '3*3') == 9; assert agent.use_tool('calculator', '3/0') == 0
C. def test_calc(): assert agent.use_tool('calculator', '3*3') == 9; assert agent.use_tool('calculator', '') == ''
D. def test_calc(): assert agent.use_tool('calculator', '3*3') == 9; assert agent.use_tool('calculator', null) == null"

Solution

  1. Step 1: Check valid input test

    All options test '3*3' == 9 correctly, which is good for valid input.
  2. Step 2: Check invalid input handling

    def test_calc(): assert agent.use_tool('calculator', '3*3') == 9; assert agent.use_tool('calculator', 'abc') == 'error' expects 'abc' input to return 'error', which correctly tests error handling. Others expect incorrect or unclear outputs.
  3. Final Answer:

    def test_calc(): assert agent.use_tool('calculator', '3*3') == 9; assert agent.use_tool('calculator', 'abc') == 'error' -> Option A
  4. Quick Check:

    Test valid and invalid inputs properly = def test_calc(): assert agent.use_tool('calculator', '3*3') == 9; assert agent.use_tool('calculator', 'abc') == 'error' [OK]
Hint: Test both valid and invalid inputs explicitly [OK]
Common Mistakes:
  • Expecting wrong output for invalid input
  • Not testing error cases
  • Assuming empty or null inputs return themselves