The Reflection and self-critique pattern focuses on improving AI agents by evaluating their own outputs and decisions. Key metrics include accuracy to measure correctness, precision and recall to understand error types, and F1 score to balance these. These metrics help the agent identify where it makes mistakes and how to improve. Without these, self-critique would lack clear guidance.
Reflection and self-critique pattern in Agentic AI - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Actual \ Predicted | Positive | Negative
-------------------|----------|---------
Positive | 80 | 20
Negative | 10 | 90
This matrix shows the agent's decisions: 80 true positives (correct), 20 false negatives (missed), 10 false positives (wrongly flagged), and 90 true negatives (correctly ignored). The agent uses this to reflect on errors.
Reflection helps balance precision and recall. For example, a medical AI must have high recall to catch all diseases (few misses), even if precision drops (some false alarms). A spam filter AI needs high precision to avoid marking good emails as spam, even if some spam slips through (lower recall). Self-critique guides the agent to adjust this balance based on goals.
Good: High accuracy (e.g., 90%+), balanced precision and recall (both above 80%), and F1 score close to 1. This means the agent correctly identifies most cases and makes few mistakes.
Bad: High accuracy but very low recall (e.g., 10%), meaning the agent misses many true cases. Or very low precision, causing many false alarms. These show poor self-critique and need improvement.
- Accuracy paradox: High accuracy can be misleading if data is imbalanced (e.g., 95% accuracy but misses all rare cases).
- Data leakage: If the agent learns from future or test data, metrics look better but are not real.
- Overfitting indicators: Very high training metrics but poor test metrics show the agent is not generalizing well.
- Ignoring recall or precision: Focusing on one metric alone can hide serious problems.
Your agent has 98% accuracy but only 12% recall on fraud detection. Is it good for production? Why or why not?
Answer: No, it is not good. The agent misses 88% of fraud cases (low recall), which is dangerous. High accuracy is misleading because fraud is rare. The agent needs better recall to catch fraud effectively.
Practice
Reflection and self-critique pattern in AI?Solution
Step 1: Understand the pattern's goal
The reflection and self-critique pattern is designed to let AI look back at its answers and find mistakes.Step 2: Identify the main benefit
By reviewing its own work, AI can fix errors and improve future responses.Final Answer:
To help AI review and improve its own answers -> Option CQuick Check:
Reflection and self-critique = improve answers [OK]
- Confusing speed with accuracy
- Thinking it stores data
- Assuming it creates new models
Solution
Step 1: Define reflection in AI context
Reflection means looking back at past answers to check for errors or improvements.Step 2: Match options to definition
Only AI reviews its previous answers to find mistakes correctly states that AI reviews previous answers to find mistakes.Final Answer:
AI reviews its previous answers to find mistakes -> Option AQuick Check:
Reflection = review past answers [OK]
- Thinking reflection means ignoring past answers
- Confusing reflection with deleting data
- Assuming copying answers is reflection
answer = AI.generate_answer(question)
errors = AI.reflect(answer)
if errors:
answer = AI.fix_errors(answer, errors)
print(answer)What will
print(answer) show if the AI finds errors?Solution
Step 1: Understand the code flow
The AI first generates an answer, then reflects to find errors. If errors exist, it fixes them.Step 2: Determine the final printed output
Since errors are fixed before printing, the output is the corrected answer.Final Answer:
The corrected answer after fixing errors -> Option BQuick Check:
Errors fixed before print = corrected answer [OK]
- Assuming original answer prints despite errors
- Thinking program stops on errors
- Confusing error message with fixed answer
answer = AI.generate_answer(question)
errors = AI.reflect(answer)
if errors:
AI.fix_errors(answer, errors)
print(answer)Why might this code fail to print the corrected answer?
Solution
Step 1: Analyze variable updates
Thefix_errorsfunction is called but its result is not assigned back toanswer.Step 2: Understand impact on output
Sinceansweris unchanged,printshows the original, not corrected, answer.Final Answer:
Because fix_errors does not update answer variable -> Option AQuick Check:
Fix must assign back to answer [OK]
- Assuming reflect never finds errors
- Thinking print is called too early
- Ignoring variable assignment after fixing
Solution
Step 1: Identify key steps in the pattern
The pattern involves reviewing answers, finding errors, fixing them, and learning from mistakes.Step 2: Match approach to pattern goals
After each answer, AI reviews its response, identifies errors, fixes them, and updates its knowledge base describes reviewing, fixing, and updating knowledge, which fits the pattern perfectly.Final Answer:
After each answer, AI reviews its response, identifies errors, fixes them, and updates its knowledge base -> Option DQuick Check:
Review + fix + learn = improved AI [OK]
- Ignoring learning from errors
- Choosing random or fixed answers
- Skipping error identification
