Factual consistency checking means making sure the AI's answers are true and match real facts. The key metrics here are Precision and Recall. Precision tells us how many of the AI's claims are actually correct. Recall tells us how many true facts the AI managed to include without missing them. We want both high because we want the AI to say only true facts (high precision) and not miss important facts (high recall).
Factual consistency checking in Prompt Engineering / GenAI - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
| Predicted True | Predicted False |
|----------------|-----------------|
| True Positive | False Negative |
| (Correct fact) | (Missed fact) |
|----------------|-----------------|
| False Positive | True Negative |
| (Wrong fact) | (Correctly no fact) |
Example numbers for 100 claims:
TP = 70 (correct facts found)
FP = 10 (wrong facts stated)
FN = 15 (true facts missed)
TN = 5 (correctly no false claims)
If the AI is very careful and only states facts it is sure about, it will have high precision but might miss some facts, so lower recall. This means fewer wrong facts but some true facts are missing.
If the AI tries to include every possible fact, it will have high recall but might include wrong facts, so lower precision. This means more complete answers but some errors.
For example, a medical AI should have high precision to avoid wrong advice. A news summarizer might want higher recall to cover all important facts.
Good: Precision and recall both above 0.85 means most facts are correct and most true facts are included.
Bad: Precision below 0.5 means many wrong facts. Recall below 0.5 means many true facts missed. Either case means the AI is not reliable.
- Accuracy paradox: High overall accuracy can hide poor precision or recall if data is unbalanced.
- Data leakage: If test facts appear in training, metrics look better but model is not truly consistent.
- Overfitting: Model memorizes facts but fails on new facts, causing low recall.
- Ignoring context: Some facts depend on context; metrics must consider this to avoid false errors.
Your model has 98% accuracy but only 12% recall on true facts. Is it good for production?
Answer: No. The model misses most true facts (low recall), so it is not reliable even if accuracy looks high. It needs improvement to find more true facts.
Practice
factual consistency checking in AI-generated text?Solution
Step 1: Understand the goal of factual consistency checking
It is used to verify that AI-generated text is accurate and trustworthy.Step 2: Compare options with this goal
Only To ensure the AI's output matches true and reliable information talks about matching output with true information, which fits the goal.Final Answer:
To ensure the AI's output matches true and reliable information -> Option DQuick Check:
Purpose = Verify truthfulness [OK]
- Confusing creativity with factual accuracy
- Thinking speed or size relates to factual checking
- Ignoring the need for truth in AI outputs
Solution
Step 1: Identify simple factual checking methods
Simple methods often compare words between generated and trusted texts.Step 2: Match options to this method
Using word overlap between generated text and reference text describes word overlap, a known simple method. Others relate to model design, not checking.Final Answer:
Using word overlap between generated text and reference text -> Option AQuick Check:
Simple method = Word overlap [OK]
- Confusing model training with checking methods
- Choosing options about model size or layers
- Ignoring the comparison aspect of checking
'The Eiffel Tower is in Berlin.' and the reference sentence: 'The Eiffel Tower is in Paris.', which factual consistency check result is correct?Solution
Step 1: Compare key facts in both sentences
Both mention Eiffel Tower, but locations differ: Berlin vs Paris.Step 2: Determine factual consistency
Different locations mean factual inconsistency despite word overlap.Final Answer:
The sentences are factually inconsistent because the location is different. -> Option CQuick Check:
Location mismatch = Inconsistent [OK]
- Assuming word overlap means consistency
- Ignoring critical fact differences
- Confusing sentence length with factual accuracy
'The capital of France is Paris.' and 'Paris is the capital of France.' as inconsistent. What is the likely error?Solution
Step 1: Analyze the checker behavior
It counts overlapping words but marks reordered sentences inconsistent.Step 2: Identify the cause
Not ignoring word order causes false negatives despite same words.Final Answer:
The checker does not ignore word order, causing false inconsistency -> Option AQuick Check:
Word order sensitivity = False inconsistency [OK]
- Assuming AI understanding causes error here
- Thinking sentence length matters
- Ignoring the role of stop words
Solution
Step 1: Understand combining methods
Combining word overlap with AI understanding means checking meaning and facts.Step 2: Evaluate options
Use a model that compares semantic meaning, then verify key facts match uses semantic comparison and fact verification, best for improved checking.Final Answer:
Use a model that compares semantic meaning, then verify key facts match -> Option BQuick Check:
Semantic + fact check = Best approach [OK]
- Choosing only word matching without context
- Ignoring reference text
- Focusing on model size instead of accuracy
