Experiment - Test cases for tool-using agents
Problem:You have an AI agent that uses external tools (like calculators or search engines) to answer questions. The agent sometimes gives wrong answers because it misuses tools or misunderstands results.
Current Metrics:Accuracy: 70% on test questions requiring tool use; 85% on questions not requiring tools.
Issue:The agent over-relies on tools and sometimes misinterprets tool outputs, causing errors especially on tool-using questions.