Experiment - Tool usage (function calling)
Problem:You have a language model that can call external functions to get information or perform tasks. Currently, the model calls functions but does not handle the responses well, leading to incorrect or incomplete answers.
Current Metrics:Function call success rate: 90%, Correct answer rate after function call: 65%
Issue:The model calls functions correctly but often fails to use the returned data properly, causing low accuracy in final answers.