The ReAct pattern combines reasoning steps with actions to solve tasks. The key metric is task success rate, which measures how often the agent completes the task correctly. This matters because ReAct aims to improve decision-making by reasoning before acting. Additionally, step efficiency (how many reasoning and acting steps are needed) is important to see if the agent is efficient. Accuracy of intermediate reasoning steps can also be tracked to understand if the agent's thought process is sound.
ReAct pattern (Reasoning + Acting) in Agentic AI - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Task Outcome Confusion Matrix (example):
| Predicted Success | Predicted Failure |
---------------|-------------------|-------------------|
Actual Success | 85 (TP) | 15 (FN) |
Actual Failure | 10 (FP) | 90 (TN) |
Total tasks = 200
- True Positive (TP): Agent correctly completes the task.
- False Negative (FN): Agent fails but predicted failure.
- False Positive (FP): Agent succeeds but predicted success.
- True Negative (TN): Agent correctly fails.
From this, we calculate precision, recall, and F1 to evaluate performance.In ReAct, precision means when the agent thinks it succeeded, how often it really did. Recall means how many of all actual successes the agent correctly identifies.
Example 1: High precision, low recall
The agent only acts when very sure, so most predicted successes are correct (high precision). But it misses many tasks it could solve (low recall).
Example 2: High recall, low precision
The agent tries to solve many tasks, catching most successes (high recall), but sometimes thinks it succeeded when it failed (low precision).
Depending on the application, you may want to balance these. For critical tasks, high recall ensures fewer misses. For costly actions, high precision avoids wrong actions.
- Good: Task success rate above 85%, precision and recall both above 80%, and low number of reasoning steps (efficient).
- Bad: Task success rate below 50%, precision or recall below 50%, or very high number of reasoning steps indicating inefficiency.
Good values mean the agent reasons well and acts correctly. Bad values show poor reasoning or wrong actions.
- Accuracy paradox: High overall accuracy can hide poor reasoning if the task is easy or imbalanced.
- Data leakage: If the agent sees answers during training, metrics will be unrealistically high.
- Overfitting: Agent may memorize reasoning patterns that don't generalize, inflating training metrics but failing on new tasks.
- Ignoring step efficiency: Measuring only success without considering reasoning steps can miss inefficiencies.
Your ReAct agent has 98% task success rate but only 12% recall on tasks requiring multi-step reasoning. Is it good for production? Why or why not?
Answer: No, it is not good. While the overall success is high, the very low recall on multi-step tasks means the agent misses most complex problems. This limits its usefulness in real scenarios needing reasoning. Improving recall on these tasks is critical.
Practice
Solution
Step 1: Understand the ReAct pattern components
The ReAct pattern mixes reasoning (thought) and acting (actions) to solve problems step-by-step.Step 2: Identify the main goal
Its goal is to help AI explain its reasoning clearly while using tools effectively.Final Answer:
To combine reasoning steps with actions for clearer problem solving -> Option CQuick Check:
ReAct = Reasoning + Acting [OK]
- Confusing ReAct with data storage methods
- Thinking it speeds up training only
- Believing it replaces humans fully
Solution
Step 1: Recall the ReAct step order
The ReAct pattern follows Thought (reasoning), then Action (doing), then Observation (seeing results), and finally Final Answer.Step 2: Match the correct sequence
Thought -> Action -> Observation -> Final Answer matches this exact order.Final Answer:
Thought -> Action -> Observation -> Final Answer -> Option BQuick Check:
Step order = Thought, Action, Observation, Final Answer [OK]
- Swapping Action and Thought order
- Placing Final Answer too early
- Confusing Observation with Action
thought = "Check if number is even" action = "Divide number by 2" observation = 4 / 2 final_answer = "Number is even" if observation == 2 else "Number is odd" print(final_answer)
Solution
Step 1: Evaluate the action and observation
The action divides 4 by 2, resulting in observation = 2.Step 2: Determine the final answer based on observation
Since observation == 2, the final answer is "Number is even".Final Answer:
Number is even -> Option DQuick Check:
4 / 2 = 2 -> even number [OK]
- Confusing observation value with input number
- Assuming division error
- Ignoring the if-else condition
thought = "Find square root" action = "Calculate sqrt of 16" observation = sqrt(16) final_answer = "Square root is " + observation print(final_answer)
Solution
Step 1: Check usage of sqrt function
The code uses sqrt(16) but does not import sqrt from math module.Step 2: Identify missing import causing error
Without 'from math import sqrt', this will cause a NameError.Final Answer:
Missing import for sqrt function -> Option AQuick Check:
sqrt needs import from math [OK]
- Assuming string concatenation error
- Thinking variable names are wrong
- Believing code runs without imports
Solution
Step 1: Understand prime checking logic
To check if 15 is prime, test divisibility by numbers from 2 up to 14.Step 2: Follow ReAct steps correctly
The agent thinks about divisibility, acts by testing 3, observes 15 is divisible, then concludes not prime.Final Answer:
Thought: Check divisibility from 2 to 14 -> Action: Test divisibility by 3 -> Observation: 15 divisible by 3 -> Final Answer: Not prime -> Option AQuick Check:
Divisible by 3 means not prime [OK]
- Only checking even divisibility
- Guessing without testing
- Ignoring observations in reasoning
