For computer use agents, the key metrics are Precision and Recall. These agents often decide actions based on user commands or environmental data.
Precision tells us how often the agent's actions are correct when it decides to act. High precision means fewer wrong actions, which is important to avoid annoying or harmful mistakes.
Recall tells us how many of the correct actions the agent actually performs out of all possible correct actions. High recall means the agent does not miss important tasks.
Balancing these two helps ensure the agent acts correctly and does not miss important user needs.
