Imagine you have a robot that cleans your house. What is the best way to describe its success criterion?
Think about what shows the robot did its main job well.
A success criterion measures if the agent achieved its main goal. Cleaning every room within 30 minutes shows success. Moving without bumping or battery life are important but not the main goal. Making a sound is just a feature.
A delivery drone must drop packages at correct locations on time. Which metric best shows if it succeeded?
Focus on the main goal: delivering packages correctly and on time.
The success metric must reflect the agent's main task. Delivering packages correctly and on time is the goal, so percentage of on-time correct deliveries is best. Battery usage, flights per day, and distance flown are secondary factors.
You train a reinforcement learning agent to play a game. Which success criterion is best to decide if training worked?
Think about what shows the agent plays the game well, not just training behavior.
Success means the agent performs well in the real task. Average score over many games shows actual performance. Training loss and action distribution relate to learning but don't guarantee good play. Model size is unrelated to success.
An agent's success criterion is: 'Agent completes task if total steps taken is less than 50'. The agent finishes tasks but often takes 60 steps. Why is this criterion problematic?
Think about what the criterion checks versus what the agent actually does.
The criterion only checks if the agent is fast (less than 50 steps), but does not check if the task was done correctly. The agent may finish correctly but slowly, so the criterion misses success.
An agent's success criterion is reaching a score above a threshold. What happens if you set this threshold too high during training?
Consider what happens if the goal is impossible or too hard to reach.
If the success threshold is too high, the agent may never achieve success, so training signals become weak or absent. This can cause training to stall or fail. Harder goals don't always speed learning. Model size is unrelated.