When deciding between fine-tuning a model or prompt engineering, key metrics to watch are task accuracy, response relevance, and latency. Fine-tuning aims to improve accuracy and relevance by changing the model's knowledge, while prompt engineering tries to get better answers without changing the model. Measuring accuracy or quality of answers helps decide which approach works best.
When to fine-tune vs prompt engineer in Prompt Engineering / GenAI - Metrics Comparison
Start learning this pattern below
Jump into concepts and practice - no test required
Task: Classify user intent from text
Confusion Matrix Example:
Predicted
Yes No
Actual Yes 80 20
No 15 85
- Fine-tuning can improve these numbers by learning from more examples.
- Prompt engineering tries to reduce errors by better question phrasing.
Fine-tuning improves both precision and recall by teaching the model new patterns. It is good when you have many examples and want consistent, high-quality results.
Prompt engineering is faster and cheaper but may only improve precision or recall slightly. It is useful when you want quick fixes or have limited data.
Example: For a customer support bot, fine-tuning can reduce missed questions (higher recall). Prompt engineering can help avoid wrong answers (higher precision) by clearer prompts.
Good: Accuracy above 85%, balanced precision and recall, fast response time.
Bad: Accuracy below 60%, very low recall (missing many correct answers), or very low precision (many wrong answers).
If prompt engineering cannot reach good metrics, fine-tuning is needed.
- Accuracy paradox: High accuracy can be misleading if data is imbalanced.
- Overfitting: Fine-tuned models may perform well on training data but poorly on new data.
- Data leakage: Using test data during fine-tuning inflates metrics falsely.
- Ignoring latency: Fine-tuning can increase response time, hurting user experience.
- Prompt bias: Poor prompt design can hide model weaknesses.
Your chatbot has 98% accuracy but only 12% recall on urgent requests. Is it good for production? Why or why not?
Answer: No, because it misses most urgent requests (low recall). This can cause serious problems. You should improve recall, possibly by fine-tuning or better prompt engineering.
Practice
Solution
Step 1: Understand fine-tuning
Fine-tuning means adjusting the model's internal settings (weights) to better fit specific data or tasks.Step 2: Understand prompt engineering
Prompt engineering means changing the way you ask the model questions without changing the model itself.Final Answer:
Fine-tuning changes the model's knowledge, while prompt engineering changes how you ask questions. -> Option CQuick Check:
Fine-tune = model change, prompt engineer = question change [OK]
- Confusing prompt engineering with model retraining
- Thinking fine-tuning only changes prompts
- Believing prompt engineering is slower than fine-tuning
Solution
Step 1: Identify prompt engineering meaning
Prompt engineering means changing how you write or format the input text to guide the model's answers.Step 2: Check options
Only Adjusting the input text to get better model responses. describes adjusting input text, which matches prompt engineering.Final Answer:
Adjusting the input text to get better model responses. -> Option AQuick Check:
Prompt engineering = input text change [OK]
- Mixing prompt engineering with model retraining
- Thinking prompt engineering changes model layers
- Confusing prompt engineering with data augmentation
Solution
Step 1: Understand the task
Improving answers for a specific medical dataset requires the model to learn new, specialized knowledge.Step 2: Choose the best method
Fine-tuning the model with the medical data updates its knowledge, making it better for this task.Final Answer:
Fine-tune the model with the medical dataset. -> Option DQuick Check:
Specific data needs fine-tuning [OK]
- Thinking prompt engineering alone fixes specialized knowledge
- Ignoring fine-tuning for domain-specific tasks
- Assuming base model works best without adaptation
Solution
Step 1: Analyze the problem
If prompt engineering fails to improve answers, the model likely lacks task-specific knowledge.Step 2: Choose the fix
Fine-tuning with relevant data updates the model's knowledge to improve answers.Final Answer:
Fine-tune the model with relevant data. -> Option AQuick Check:
Poor answers + prompt fail = fine-tune needed [OK]
- Trying unrelated fixes like changing architecture
- Assuming shorter prompts fix knowledge gaps
- Restarting server won't improve model knowledge
Solution
Step 1: Identify constraints
You want a quick improvement without retraining the model.Step 2: Choose the best approach
Prompt engineering lets you add product info in questions to guide the model without retraining.Final Answer:
Use prompt engineering to add product info in the questions. -> Option BQuick Check:
Quick fix without retrain = prompt engineering [OK]
- Thinking fine-tuning is always fastest
- Replacing model unnecessarily
- Ignoring product info causes poor answers
