For working memory in AI agents, the key metric is task state accuracy. This measures how well the agent remembers and updates the current task details. Good task state accuracy means the agent keeps the right information to make decisions. We also look at latency to see how quickly the memory updates, and consistency to check if the memory stays stable over time.
Working memory for current task state in Agentic AI - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Task State Prediction Confusion Matrix:
| Predicted Correct | Predicted Incorrect |
------|-------------------|---------------------|
Actual Correct | 85 | 15 |
Actual Incorrect| 10 | 90 |
Total samples = 85 + 15 + 10 + 90 = 200
Precision = TP / (TP + FP) = 85 / (85 + 10) = 0.894
Recall = TP / (TP + FN) = 85 / (85 + 15) = 0.85
F1 Score = 2 * (0.894 * 0.85) / (0.894 + 0.85) ≈ 0.871
In working memory for task state, precision means the agent's memory updates are mostly correct, avoiding wrong info. Recall means the agent remembers all important details without missing any.
Example: If the agent has high precision but low recall, it rarely stores wrong info but often forgets some task details. This can cause incomplete decisions.
If it has high recall but low precision, it remembers everything but includes wrong or outdated info, confusing the agent.
Balancing precision and recall is key for reliable task memory.
Good metrics:
- Precision > 0.85: Most memory updates are correct.
- Recall > 0.80: Most important task details are remembered.
- F1 Score > 0.85: Balanced and reliable memory.
- Low latency: Memory updates happen quickly.
- Stable consistency: Memory does not fluctuate unnecessarily.
Bad metrics:
- Precision < 0.6: Many wrong memory updates.
- Recall < 0.5: Many important details forgotten.
- F1 Score < 0.6: Poor balance, unreliable memory.
- High latency: Slow memory updates hurt decisions.
- Inconsistent memory: Frequent unnecessary changes confuse the agent.
- Accuracy paradox: High overall accuracy can hide poor recall or precision, misleading about memory quality.
- Data leakage: If future task info leaks into memory evaluation, metrics look better but are unrealistic.
- Overfitting: Memory tuned too closely to training tasks may fail on new tasks, showing good metrics only in training.
- Ignoring latency: Good accuracy but slow updates make memory less useful in real-time tasks.
- Unstable memory: Metrics may look good on average but frequent memory flips confuse agent behavior.
Your agent's working memory has 98% accuracy but only 12% recall on important task details. Is it good for production? Why or why not?
Answer: No, it is not good. Although accuracy is high, the very low recall means the agent forgets most important details. This will cause poor decisions because the agent lacks critical information. High recall is essential for reliable task memory.
Practice
Solution
Step 1: Understand working memory function
Working memory holds temporary information needed right now for the task.Step 2: Compare options to definition
Only To temporarily store current task details for decision making correctly describes temporary storage for current task decisions.Final Answer:
To temporarily store current task details for decision making -> Option AQuick Check:
Working memory = temporary task info [OK]
- Confusing working memory with long-term memory
- Thinking it stores all past tasks permanently
- Assuming it deletes data immediately
current_step?Solution
Step 1: Identify working memory type
Working memory holds the current task state, so it should be replaced, not appended or updated as a collection.Step 2: Analyze code options
working_memory = current_step assigns the current step directly, which matches replacing the current task state.Final Answer:
working_memory = current_step -> Option BQuick Check:
Assign current step to working memory [OK]
- Using append on a non-list working memory
- Calling update without a dict type
- Using add which is not a list method
working_memory after execution?working_memory = None
steps = ['start', 'process', 'finish']
for step in steps:
working_memory = step
print(working_memory)Solution
Step 1: Trace the loop updating working memory
Loop sets working_memory to 'start', then 'process', then 'finish' in order.Step 2: Identify final value after loop
After the last iteration, working_memory holds 'finish'.Final Answer:
'finish' -> Option AQuick Check:
Last step assigned = 'finish' [OK]
- Thinking working_memory accumulates all steps
- Confusing initial None with final value
- Assuming print shows first step
working_memory = None current_step = 'step1' working_memory.append(current_step)
Solution
Step 1: Check working_memory type
It is None, which is not a list and has no append method.Step 2: Understand append usage
append works only on list objects, so calling it on None causes an error.Final Answer:
working_memory is None and has no append method -> Option CQuick Check:
NoneType has no append method [OK]
- Assuming append works on None
- Thinking current_step is undefined
- Believing append needs two arguments
Solution
Step 1: Identify need to store last two steps in order
We need a structure that keeps order and can hold multiple items.Step 2: Evaluate data structures
List supports order and appending; removing oldest keeps size 2. String or set do not keep order or multiple recent steps properly.Final Answer:
Use a list and append new steps, removing oldest when length > 2 -> Option DQuick Check:
List + append + remove oldest = last two steps [OK]
- Using string which holds only one step
- Using set which loses order
- Using dict which overwrites keys
