Model Pipeline - Regression testing for agent changes
This pipeline tests if changes to an AI agent affect its performance. It runs the agent on past tasks, compares new results to old ones, and checks for any unexpected drops in accuracy or increases in errors.