Few-shot prompting is about teaching a model to perform a task with very few examples. The key metric here is accuracy or task-specific correctness because it shows how well the model understands and applies the examples given. For tasks like classification or question answering, accuracy tells us if the model is making the right choices after seeing just a few samples.
Few-shot prompting in Prompt Engineering / GenAI - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Confusion Matrix for a 3-class classification task:
Predicted
A B C
True A 18 2 0
B 3 15 2
C 0 1 19
Total samples = 60
From this:
- True Positives (TP) for class A = 18
- False Positives (FP) for class A = 3 + 0 = 3
- False Negatives (FN) for class A = 2 + 0 = 2
Precision for class A = TP / (TP + FP) = 18 / (18 + 3) = 0.86
Recall for class A = TP / (TP + FN) = 18 / (18 + 2) = 0.90
In few-shot prompting, sometimes the model guesses carefully (high precision) but misses some correct answers (low recall). Other times, it tries to catch all correct answers (high recall) but makes more mistakes (low precision).
Example 1: For a medical diagnosis task, high recall is important because missing a disease is dangerous. Few-shot prompting should focus on catching all positives, even if some false alarms happen.
Example 2: For spam detection, high precision matters more. Few-shot prompting should avoid marking good emails as spam, even if some spam slips through.
Good: Accuracy above 80% with balanced precision and recall means the model learned well from few examples.
Bad: Accuracy below 50% or very low recall (e.g., under 30%) means the model is not understanding the examples or missing many correct answers.
- Accuracy paradox: High accuracy can be misleading if the task is unbalanced (e.g., mostly one class).
- Data leakage: If examples in the prompt are too similar to test data, metrics look better but model is not truly learning.
- Overfitting: Model might memorize few examples but fail on new inputs, causing poor generalization.
Your few-shot prompted model has 98% accuracy but only 12% recall on the positive class. Is it good for production?
Answer: No. The model misses most positive cases (low recall), which is critical in many tasks. High accuracy here is misleading because the data is likely imbalanced. You should improve recall before using it in production.
Practice
few-shot prompting in AI models?Solution
Step 1: Understand few-shot prompting concept
Few-shot prompting means giving the model a few examples in the prompt to help it understand the task.Step 2: Compare with other methods
Unlike training or fine-tuning, few-shot prompting does not require changing the model weights, just examples in the prompt.Final Answer:
Showing a few examples in the prompt to teach the model a task -> Option AQuick Check:
Few-shot prompting = examples in prompt [OK]
- Confusing few-shot prompting with full model training
- Thinking it requires many examples
- Assuming no examples are given
Solution
Step 1: Identify proper prompt structure
Few-shot prompting works best when examples are clearly listed before the new question.Step 2: Eliminate incorrect options
Options A, B, and D do not provide clear examples or add unrelated content, which confuses the model.Final Answer:
List examples clearly, then ask the new question -> Option CQuick Check:
Clear examples first = correct prompt [OK]
- Skipping examples completely
- Adding unrelated text that confuses the model
- Using comments instead of examples
Q: What is 2 + 3? A: 5 Q: What is 4 + 1? A: 5 Q: What is 7 + 2? A:
What will the model most likely answer?
Solution
Step 1: Analyze the examples given
The examples show addition questions with correct answers: 2+3=5 and 4+1=5.Step 2: Predict the answer for 7 + 2
7 + 2 equals 9, so the model should answer 9 following the pattern.Final Answer:
9 -> Option BQuick Check:
7+2=9 [OK]
- Repeating previous answer 5
- Confusing question numbers
- Ignoring addition operation
Q: Translate 'cat' to Spanish. A: gato Q: Translate 'dog' to Spanish. A: perro Q: Translate 'bird' to Spanish. A: perro
What is the main error here?
Solution
Step 1: Check the last example's answer
The last question asks for 'bird' in Spanish, but the answer repeats 'perro' (dog).Step 2: Identify correct Spanish word
The correct Spanish word for 'bird' is 'pájaro', so the answer is wrong.Final Answer:
The last answer repeats 'perro' instead of 'pájaro' -> Option AQuick Check:
Wrong repeated answer = error [OK]
- Copying previous answer by mistake
- Ignoring answer correctness
- Assuming question marks are required
Solution
Step 1: Identify the task in the prompt
The task is to classify fruits as 'sweet' or 'sour', so examples must show this classification clearly.Step 2: Evaluate each option's relevance
Q: Is lemon sweet or sour?\nA: sour\nQ: Is apple sweet or sour?\nA: sweet\nQ: Is orange sweet or sour?\nA: correctly shows examples of fruits labeled 'sweet' or 'sour'. Options B, C, and D either reverse labels or ask unrelated questions.Final Answer:
Q: Is lemon sweet or sour? A: sour Q: Is apple sweet or sour? A: sweet Q: Is orange sweet or sour? A: -> Option DQuick Check:
Examples match task = best prompt [OK]
- Mixing up labels in examples
- Using unrelated questions
- Not showing clear classification
