For chatbots, the key metrics are accuracy of understanding user intent and response relevance. Accuracy tells us how often the chatbot correctly understands what the user wants. Response relevance measures if the chatbot's reply fits the question well. These metrics matter because a chatbot that misunderstands users or gives unrelated answers will frustrate people and fail its purpose.
Chatbot development basics in Prompt Engineering / GenAI - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
| Predicted Intent A | Predicted Intent B |
|--------------------|--------------------|
| True Positives (TP) | False Positives (FP)|
| False Negatives (FN)| True Negatives (TN) |
Example numbers:
| 80 | 20 |
| 10 | 90 |
Total samples = 80 + 20 + 10 + 90 = 200
Precision = TP / (TP + FP) = 80 / (80 + 20) = 0.8
Recall = TP / (TP + FN) = 80 / (80 + 10) = 0.89
F1 Score = 2 * (0.8 * 0.89) / (0.8 + 0.89) ≈ 0.84
High Precision, Low Recall: The chatbot only responds when very sure about user intent. It avoids wrong answers but may miss some user questions, leading to many "I don't understand" replies.
High Recall, Low Precision: The chatbot tries to answer most questions, even if unsure. It covers many user intents but sometimes gives wrong or irrelevant answers, which can confuse users.
Choosing the right balance depends on chatbot goals. For customer support, high precision avoids wrong info. For casual chatbots, higher recall may keep conversations flowing.
- Good: Precision and recall above 0.8, F1 score above 0.8 means the chatbot understands and responds well.
- Bad: Precision or recall below 0.5 means many wrong or missed answers, leading to poor user experience.
- Accuracy: Over 90% accuracy on intent classification is good, but check precision and recall to avoid misleading results.
- Accuracy paradox: If one intent is very common, high accuracy can hide poor performance on rare intents.
- Data leakage: Testing on data the chatbot has seen before inflates metrics falsely.
- Overfitting: Chatbot performs well on training data but poorly on new user inputs.
- Ignoring user satisfaction: Metrics alone don't capture if users feel helped or frustrated.
Your chatbot has 98% accuracy but only 12% recall on a key user intent. Is it good for production? Why or why not?
Answer: No, it is not good. The high accuracy likely comes from many easy or common intents, but the very low recall means the chatbot misses most cases of the important intent. This will frustrate users needing help with that intent, so the chatbot needs improvement before production.
Practice
Solution
Step 1: Understand chatbot function
A chatbot is designed to communicate with people using text or voice.Step 2: Match purpose with options
Only To help computers talk with people easily describes helping computers talk with people easily.Final Answer:
To help computers talk with people easily -> Option AQuick Check:
Chatbot purpose = talk with people [OK]
- Confusing chatbots with data storage systems
- Thinking chatbots create images
- Assuming chatbots do math calculations
Solution
Step 1: Recall Python dictionary syntax
Python uses curly braces {} with key: value pairs for dictionaries.Step 2: Check each option
response = {'hello': 'Hi there!'} uses correct syntax with {'hello': 'Hi there!'}; others use invalid syntax.Final Answer:
response = {'hello': 'Hi there!'} -> Option BQuick Check:
Python dict = {'key': 'value'} [OK]
- Using => instead of : in Python dictionaries
- Using parentheses instead of braces
- Trying to assign string with = inside quotes
responses = {'hi': 'Hello!', 'bye': 'Goodbye!'}
user_input = 'hi'
print(responses.get(user_input, 'I do not understand'))Solution
Step 1: Understand dictionary get method
responses.get(user_input, default) returns value for key or default if key missing.Step 2: Check user_input key in dictionary
user_input is 'hi', which exists in responses with value 'Hello!'.Final Answer:
Hello! -> Option DQuick Check:
Key 'hi' found = 'Hello!' [OK]
- Assuming default message prints even if key exists
- Confusing keys 'hi' and 'bye'
- Expecting an error from get method
responses = {'hello': 'Hi!'}
user_input = input('Say something: ')
print(responses[user_input])Solution
Step 1: Analyze dictionary access
Accessing responses[user_input] causes error if user_input key not found.Step 2: Check for default handling
Code lacks default fallback; should use get() or try-except to avoid crash.Final Answer:
Missing default response if input not in dictionary -> Option CQuick Check:
Direct dict access needs key check [OK]
- Thinking input() is disallowed in chatbot
- Believing dictionary syntax is wrong
- Assuming print statement is incorrect
Solution
Step 1: Understand dictionary keys for multiple inputs
Each key must be separate to match different user inputs.Step 2: Evaluate options for correct syntax
responses = {'morning': 'Good morning!', 'good morning': 'Good morning!'} defines two keys separately; others use invalid Python expressions as keys.Final Answer:
responses = {'morning': 'Good morning!', 'good morning': 'Good morning!'} -> Option AQuick Check:
Separate keys for inputs = responses = {'morning': 'Good morning!', 'good morning': 'Good morning!'} [OK]
- Trying to combine keys with or/&/+ operators
- Using invalid syntax for dictionary keys
- Assuming one key can match multiple phrases
