When loading and parsing documents for AI models, the key metric is Parsing Accuracy. This measures how correctly the document content is extracted and structured. Good parsing ensures the AI model receives clean, accurate data to learn from or analyze. Without accurate parsing, the model may get wrong or incomplete information, leading to poor results.
Document loading and parsing in Prompt Engineering / GenAI - Model Metrics & Evaluation
For document parsing, a confusion matrix can show how many document elements were correctly or incorrectly identified. For example, if parsing extracts text blocks, tables, and images, the matrix might look like this:
| Predicted \ Actual | Text | Table | Image |
|--------------------|------|-------|-------|
| Text | 90 | 5 | 0 |
| Table | 3 | 85 | 2 |
| Image | 0 | 1 | 95 |
This shows how many elements were correctly parsed (diagonal) versus misclassified (off-diagonal).
Precision means how many parsed elements are actually correct. Recall means how many real elements were found by the parser.
For example, if the parser finds 100 tables but only 80 are real tables, precision is 80%. If there are 100 tables in the document but the parser finds only 70, recall is 70%.
High precision but low recall means the parser is careful but misses many elements. High recall but low precision means it finds many elements but with many mistakes. Balance depends on use case.
Good parsing: Precision and recall above 90%. Most document parts are correctly identified and extracted.
Bad parsing: Precision or recall below 70%. Many elements are missed or wrongly extracted, causing errors downstream.
- Ignoring partial parsing: Counting only fully parsed documents misses partial errors.
- Data leakage: Using test documents seen during parser training inflates metrics.
- Overfitting: Parser tuned too much on one document type may fail on others.
- Accuracy paradox: High overall accuracy can hide poor parsing of rare but important elements.
Your document parser has 98% accuracy but only 12% recall on tables. Is it good for production? Why not?
Answer: No, because it misses most tables (low recall). Even if overall accuracy is high, missing tables can cause big problems if tables are important for your task.