Bird
Raised Fist0
Prompt Engineering / GenAIml~8 mins

System prompts and role setting in Prompt Engineering / GenAI - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - System prompts and role setting
Which metric matters for System prompts and role setting and WHY

When working with system prompts and role setting in AI models, the key metric to focus on is accuracy of the model's responses matching the intended role or instruction. This is because the system prompt guides the AI's behavior, so measuring how well the output aligns with the prompt ensures the model follows instructions correctly.

Additionally, precision and recall can be important if the task involves classification or identifying specific intents from prompts. For example, precision measures how often the model's responses are relevant to the role, while recall measures how many relevant responses the model captures.

Confusion matrix example for role setting classification
      | Predicted Role: Assistant | Predicted Role: User |
      |---------------------------|---------------------|
      | True Positive (TP) = 80   | False Negative (FN) = 20 |
      | False Positive (FP) = 10  | True Negative (TN) = 90 |

      Total samples = 80 + 20 + 10 + 90 = 200
    

From this matrix:

  • Precision = TP / (TP + FP) = 80 / (80 + 10) = 0.89
  • Recall = TP / (TP + FN) = 80 / (80 + 20) = 0.80
  • Accuracy = (TP + TN) / Total = (80 + 90) / 200 = 0.85
Precision vs Recall tradeoff with system prompts

Imagine a chatbot that must respond as a helpful assistant (role). If the model has high precision but low recall, it means it rarely gives wrong role responses but misses many correct ones. This can make the chatbot seem unhelpful or silent.

If recall is high but precision is low, the chatbot tries to respond often but sometimes acts outside the intended role, confusing users.

Balancing precision and recall ensures the chatbot reliably follows the system prompt role without missing or misbehaving.

Good vs Bad metric values for system prompt role adherence
  • Good: Precision and recall above 0.85, accuracy above 0.90 -- model consistently follows role instructions.
  • Bad: Precision or recall below 0.60, accuracy below 0.70 -- model often ignores or misinterprets role prompts.
Common pitfalls in evaluating system prompt role setting
  • Accuracy paradox: High accuracy can be misleading if the dataset is imbalanced (e.g., mostly one role).
  • Data leakage: If test prompts are too similar to training, metrics may overestimate real performance.
  • Overfitting: Model may memorize role instructions but fail on new or varied prompts.
  • Ignoring context: Metrics that do not consider conversation flow may miss role adherence issues.
Self-check question

Your model has 98% accuracy but only 12% recall on following the system prompt role. Is it good for production? Why or why not?

Answer: No, it is not good. The low recall means the model misses most cases where it should follow the role, even if overall accuracy is high. This means the model often fails to act as instructed, which is critical for system prompt tasks.

Key Result
For system prompts and role setting, balancing precision and recall ensures the model reliably follows instructions without missing or misbehaving.

Practice

(1/5)
1. What is the main purpose of a system prompt in AI?
easy
A. To tell the AI what role to play
B. To train the AI with new data
C. To fix errors in AI code
D. To speed up AI computations

Solution

  1. Step 1: Understand system prompt role

    System prompts guide AI on how to behave or respond.
  2. Step 2: Differentiate from other AI tasks

    Training data and code fixes are separate from role setting.
  3. Final Answer:

    To tell the AI what role to play -> Option A
  4. Quick Check:

    System prompt = role setting [OK]
Hint: System prompts set AI's role or behavior [OK]
Common Mistakes:
  • Confusing system prompts with training data
  • Thinking system prompts fix AI bugs
  • Assuming system prompts speed up AI
2. Which of the following is the correct way to set a system prompt for an AI to act as a tutor?
easy
A. Set prompt = 'You are a helpful assistant.'
B. Set prompt = 'Fix errors in code.'
C. Set prompt = 'Run training on tutor data.'
D. Set prompt = 'You are a tutor who explains simply.'

Solution

  1. Step 1: Identify correct prompt style

    The prompt should clearly tell AI to act as a tutor and explain simply.
  2. Step 2: Eliminate unrelated options

    Options about training or fixing code are not system prompts.
  3. Final Answer:

    Set prompt = 'You are a tutor who explains simply.' -> Option D
  4. Quick Check:

    Clear role description = correct prompt [OK]
Hint: Use clear role description in prompt [OK]
Common Mistakes:
  • Using vague prompts like 'helpful assistant'
  • Confusing prompts with training commands
  • Writing prompts unrelated to role
3. Given this system prompt: 'You are a translator from English to Spanish.' What will the AI most likely do when asked 'Hello, how are you?'?
medium
A. Translate it to Spanish
B. Ignore the prompt and answer in English
C. Explain the meaning of the sentence
D. Ask for more context

Solution

  1. Step 1: Analyze the system prompt

    The prompt sets AI's role as a translator from English to Spanish.
  2. Step 2: Predict AI response to input

    AI will translate the input sentence into Spanish as instructed.
  3. Final Answer:

    Translate it to Spanish -> Option A
  4. Quick Check:

    Translator prompt = translate output [OK]
Hint: Match prompt role to AI output [OK]
Common Mistakes:
  • Thinking AI explains instead of translates
  • Assuming AI ignores system prompt
  • Expecting AI to ask questions
4. You wrote this system prompt: 'You are a helpful assistant.' but the AI keeps giving very short answers. What is the best fix?
medium
A. Restart the AI server.
B. Remove the system prompt entirely.
C. Change prompt to 'You are a helpful assistant who explains in detail.'
D. Add more training data.

Solution

  1. Step 1: Identify problem with prompt

    The prompt is too vague, so AI gives short answers.
  2. Step 2: Improve prompt specificity

    Adding 'explains in detail' guides AI to give longer answers.
  3. Final Answer:

    Change prompt to 'You are a helpful assistant who explains in detail.' -> Option C
  4. Quick Check:

    Specific prompt = better answers [OK]
Hint: Make prompts more specific for better answers [OK]
Common Mistakes:
  • Removing prompt instead of improving it
  • Thinking training data fixes prompt issues
  • Restarting server won't change AI behavior
5. You want the AI to act as a math tutor who only answers questions about addition and subtraction. Which system prompt is best?
hard
A. You are a math tutor who answers all math questions.
B. You are a math tutor who only answers addition and subtraction questions.
C. You are a general assistant.
D. You are a math tutor who answers multiplication questions.

Solution

  1. Step 1: Understand the role restriction

    The AI should only answer addition and subtraction questions.
  2. Step 2: Choose prompt that limits scope correctly

    You are a math tutor who only answers addition and subtraction questions. clearly restricts AI to addition and subtraction only.
  3. Step 3: Eliminate broader or unrelated prompts

    The other options do not restrict to addition and subtraction.
  4. Final Answer:

    You are a math tutor who only answers addition and subtraction questions. -> Option B
  5. Quick Check:

    Specific role limits AI scope [OK]
Hint: Use clear limits in prompt for focused AI roles [OK]
Common Mistakes:
  • Using broad prompts without limits
  • Choosing unrelated math topics
  • Not specifying question types