For state persistence, the key metric is consistency accuracy. This measures how well the system remembers and restores the correct state across sessions. It is important because the AI should continue tasks smoothly without losing context or data. Metrics like state restoration accuracy or session continuity rate show if the AI keeps the right information over time.
State persistence across sessions in Agentic AI - Model Metrics & Evaluation
State Persistence Confusion Matrix (Example):
| Correctly Restored | Incorrectly Restored |
---------------------------------------------------------
Total States | 90 | 10 |
- True Positive (TP): 90 states correctly restored
- False Negative (FN): 10 states lost or wrongly restored
Total states = TP + FN = 100
In state persistence, recall is crucial. Recall here means how many of the saved states are correctly restored. Missing a saved state (low recall) means losing important user data or context.
Precision means how many restored states are actually correct. High precision avoids restoring wrong or corrupted states.
Example: If an AI assistant restores 95 states but only 80 are correct, precision = 80/95 = 0.84. If it missed restoring 20 states, recall = 80/100 = 0.8. We want both high, but recall is often more important to avoid losing data.
- Good: Consistency accuracy > 95%, recall > 90%, precision > 90%. The AI reliably restores user state with minimal loss or errors.
- Bad: Consistency accuracy < 70%, recall < 60%. The AI often loses or corrupts saved states, causing user frustration and broken workflows.
- Accuracy paradox: High overall accuracy can hide poor recall if most states are trivial to restore.
- Data leakage: Testing on states that were never cleared can inflate metrics falsely.
- Overfitting: The system may memorize specific states but fail on new or changed contexts.
Your AI model has 98% accuracy but only 12% recall on restoring user states. Is it good for production? Why or why not?
Answer: No, it is not good. The low recall means the AI misses most saved states, losing important user context. High accuracy alone is misleading if the AI rarely restores the correct states users need.