Prompt Engineering / GenAIml~8 mins

Model selection (GPT-4, GPT-3.5) in Prompt Engineering / GenAI - Model Metrics & Evaluation

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Metrics & Evaluation - Model selection (GPT-4, GPT-3.5)

Which metric matters for model selection and WHY

When choosing between GPT-4 and GPT-3.5, key metrics include accuracy, response quality, and latency. Accuracy shows how often the model gives correct or useful answers. Response quality measures how clear, relevant, and helpful the answers are. Latency is how fast the model responds. We pick metrics based on what matters most: if you want the best answers, accuracy and quality matter more; if you want speed, latency matters more.

Confusion matrix or equivalent visualization

Example: Comparing GPT-4 and GPT-3.5 on 100 questions

| Model   | Correct (TP) | Incorrect (FP+FN) | Total |
|---------|--------------|------------------|-------|
| GPT-4   | 90           | 10               | 100   |
| GPT-3.5 | 75           | 25               | 100   |

Here, GPT-4 answers 90 out of 100 correctly, GPT-3.5 answers 75 correctly.

Precision vs Recall tradeoff with concrete examples

For language models, think of precision as how often the model's answers are correct when it gives an answer, and recall as how many of all possible correct answers the model actually provides.

GPT-4 tends to have higher precision and recall, giving more correct and complete answers. GPT-3.5 might be faster but less precise, sometimes giving wrong or incomplete answers.

Example: If you want a chatbot that never gives wrong info, prioritize precision (GPT-4). If you want quick answers and can tolerate some mistakes, recall and speed (GPT-3.5) might be enough.

What "good" vs "bad" metric values look like for this use case

Good: GPT-4 with 90%+ accuracy, high response quality, and acceptable latency (e.g., 1-2 seconds).

Bad: GPT-3.5 with 70% accuracy, lower quality answers, or very slow responses that frustrate users.

Good models balance accuracy and speed to fit your needs.

Common pitfalls in model selection metrics

Ignoring latency: A very accurate model that is too slow may not be practical.
Overfitting to test data: A model might look great on a small test set but fail in real use.
Data leakage: If test questions were seen during training, accuracy is misleading.
Focusing only on accuracy: Quality and relevance of answers matter too.

Self-check question

Your GPT-3.5 model has 98% accuracy on a test set but only 12% recall on rare topics. Is it good for production? Why or why not?

Answer: No, because while overall accuracy is high, the model misses most rare topics (low recall). This means it often fails to answer important but uncommon questions, which can hurt user experience.

Key Result

Model selection balances accuracy, response quality, and speed to fit user needs; GPT-4 generally offers higher accuracy and quality, GPT-3.5 offers faster but less precise responses.

Practice

(1/5)

1. Which model should you choose if you need detailed and complex text generation?

easy

A. GPT-3.5

B. Both are equally detailed

C. GPT-4

D. Neither, use a smaller model

Model selection (GPT-4, GPT-3.5) in Prompt Engineering / GenAI - Model Metrics & Evaluation

Start learning this pattern below

Practice

Solution

Step 1: Understand model capabilities

Step 2: Match task complexity to model

Final Answer:

Quick Check:

Solution

Step 1: Recall model naming conventions

Step 2: Identify correct option

Final Answer:

Quick Check:

Solution

Step 1: Identify the model used in code

Step 2: Recall model speed and detail tradeoff

Final Answer:

Quick Check:

Solution

Step 1: Check model name correctness

Step 2: Understand error cause

Final Answer:

Quick Check:

Solution

Step 1: Understand tradeoffs between GPT-3.5 and GPT-4

Step 2: Match chatbot needs to model selection

Final Answer:

Quick Check: