Bird
Raised Fist0
Prompt Engineering / GenAIml~8 mins

Chatbot development basics in Prompt Engineering / GenAI - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Chatbot development basics
Which metric matters for Chatbot development basics and WHY

For chatbots, the key metrics are accuracy of understanding user intent and response relevance. Accuracy tells us how often the chatbot correctly understands what the user wants. Response relevance measures if the chatbot's reply fits the question well. These metrics matter because a chatbot that misunderstands users or gives unrelated answers will frustrate people and fail its purpose.

Confusion matrix example for intent classification
      | Predicted Intent A | Predicted Intent B |
      |--------------------|--------------------|
      | True Positives (TP) | False Positives (FP)|
      | False Negatives (FN)| True Negatives (TN) |

Example numbers:
      | 80                 | 20                 |
      | 10                 | 90                 |

Total samples = 80 + 20 + 10 + 90 = 200

Precision = TP / (TP + FP) = 80 / (80 + 20) = 0.8
Recall = TP / (TP + FN) = 80 / (80 + 10) = 0.89
F1 Score = 2 * (0.8 * 0.89) / (0.8 + 0.89) ≈ 0.84
    
Precision vs Recall tradeoff with chatbot examples

High Precision, Low Recall: The chatbot only responds when very sure about user intent. It avoids wrong answers but may miss some user questions, leading to many "I don't understand" replies.

High Recall, Low Precision: The chatbot tries to answer most questions, even if unsure. It covers many user intents but sometimes gives wrong or irrelevant answers, which can confuse users.

Choosing the right balance depends on chatbot goals. For customer support, high precision avoids wrong info. For casual chatbots, higher recall may keep conversations flowing.

What "good" vs "bad" metric values look like for chatbots
  • Good: Precision and recall above 0.8, F1 score above 0.8 means the chatbot understands and responds well.
  • Bad: Precision or recall below 0.5 means many wrong or missed answers, leading to poor user experience.
  • Accuracy: Over 90% accuracy on intent classification is good, but check precision and recall to avoid misleading results.
Common pitfalls in chatbot metrics
  • Accuracy paradox: If one intent is very common, high accuracy can hide poor performance on rare intents.
  • Data leakage: Testing on data the chatbot has seen before inflates metrics falsely.
  • Overfitting: Chatbot performs well on training data but poorly on new user inputs.
  • Ignoring user satisfaction: Metrics alone don't capture if users feel helped or frustrated.
Self-check question

Your chatbot has 98% accuracy but only 12% recall on a key user intent. Is it good for production? Why or why not?

Answer: No, it is not good. The high accuracy likely comes from many easy or common intents, but the very low recall means the chatbot misses most cases of the important intent. This will frustrate users needing help with that intent, so the chatbot needs improvement before production.

Key Result
Precision and recall are key to measure chatbot understanding and response quality, balancing them ensures better user experience.

Practice

(1/5)
1. What is the main purpose of a chatbot in simple terms?
easy
A. To help computers talk with people easily
B. To store large amounts of data
C. To create images from text
D. To run complex math calculations

Solution

  1. Step 1: Understand chatbot function

    A chatbot is designed to communicate with people using text or voice.
  2. Step 2: Match purpose with options

    Only To help computers talk with people easily describes helping computers talk with people easily.
  3. Final Answer:

    To help computers talk with people easily -> Option A
  4. Quick Check:

    Chatbot purpose = talk with people [OK]
Hint: Chatbots are for chatting, not storing or calculating [OK]
Common Mistakes:
  • Confusing chatbots with data storage systems
  • Thinking chatbots create images
  • Assuming chatbots do math calculations
2. Which of the following is the correct way to define a simple chatbot response in Python?
easy
A. response = (hello: 'Hi there!')
B. response = {'hello': 'Hi there!'}
C. response = ['hello' => 'Hi there!']
D. response = 'hello' = 'Hi there!'

Solution

  1. Step 1: Recall Python dictionary syntax

    Python uses curly braces {} with key: value pairs for dictionaries.
  2. Step 2: Check each option

    response = {'hello': 'Hi there!'} uses correct syntax with {'hello': 'Hi there!'}; others use invalid syntax.
  3. Final Answer:

    response = {'hello': 'Hi there!'} -> Option B
  4. Quick Check:

    Python dict = {'key': 'value'} [OK]
Hint: Python dict uses curly braces and colon for key-value [OK]
Common Mistakes:
  • Using => instead of : in Python dictionaries
  • Using parentheses instead of braces
  • Trying to assign string with = inside quotes
3. What will be the output of this Python code snippet for a chatbot?
responses = {'hi': 'Hello!', 'bye': 'Goodbye!'}
user_input = 'hi'
print(responses.get(user_input, 'I do not understand'))
medium
A. Error
B. Goodbye!
C. I do not understand
D. Hello!

Solution

  1. Step 1: Understand dictionary get method

    responses.get(user_input, default) returns value for key or default if key missing.
  2. Step 2: Check user_input key in dictionary

    user_input is 'hi', which exists in responses with value 'Hello!'.
  3. Final Answer:

    Hello! -> Option D
  4. Quick Check:

    Key 'hi' found = 'Hello!' [OK]
Hint: dict.get(key, default) returns value or default if missing [OK]
Common Mistakes:
  • Assuming default message prints even if key exists
  • Confusing keys 'hi' and 'bye'
  • Expecting an error from get method
4. Identify the error in this chatbot code snippet:
responses = {'hello': 'Hi!'}
user_input = input('Say something: ')
print(responses[user_input])
medium
A. print statement is incorrect
B. Syntax error in dictionary definition
C. Missing default response if input not in dictionary
D. input() function is not allowed in chatbot

Solution

  1. Step 1: Analyze dictionary access

    Accessing responses[user_input] causes error if user_input key not found.
  2. Step 2: Check for default handling

    Code lacks default fallback; should use get() or try-except to avoid crash.
  3. Final Answer:

    Missing default response if input not in dictionary -> Option C
  4. Quick Check:

    Direct dict access needs key check [OK]
Hint: Use dict.get() to avoid key errors from unknown input [OK]
Common Mistakes:
  • Thinking input() is disallowed in chatbot
  • Believing dictionary syntax is wrong
  • Assuming print statement is incorrect
5. You want your chatbot to answer "Good morning!" when the user says "morning" or "good morning". Which Python code snippet correctly handles this?
hard
A. responses = {'morning': 'Good morning!', 'good morning': 'Good morning!'}
B. responses = {'morning' or 'good morning': 'Good morning!'}
C. responses = {'morning' & 'good morning': 'Good morning!'}
D. responses = {'morning' + 'good morning': 'Good morning!'}

Solution

  1. Step 1: Understand dictionary keys for multiple inputs

    Each key must be separate to match different user inputs.
  2. Step 2: Evaluate options for correct syntax

    responses = {'morning': 'Good morning!', 'good morning': 'Good morning!'} defines two keys separately; others use invalid Python expressions as keys.
  3. Final Answer:

    responses = {'morning': 'Good morning!', 'good morning': 'Good morning!'} -> Option A
  4. Quick Check:

    Separate keys for inputs = responses = {'morning': 'Good morning!', 'good morning': 'Good morning!'} [OK]
Hint: Use separate keys for each input phrase in dictionary [OK]
Common Mistakes:
  • Trying to combine keys with or/&/+ operators
  • Using invalid syntax for dictionary keys
  • Assuming one key can match multiple phrases