Introduction
Evaluation helps catch mistakes early by testing how your code works before using it for real tasks.
Before deploying a new language model chain to make sure it answers correctly.
When adding new features to check they don't break existing behavior.
To verify that your prompts produce expected results in different situations.
When debugging unexpected outputs from your language model.
To improve confidence that your app will work well for users.