Bird
Raised Fist0
Prompt Engineering / GenAIml~8 mins

Content filtering in Prompt Engineering / GenAI - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Content filtering
Which metric matters for Content filtering and WHY

Content filtering models decide if content is safe or harmful. The key metrics are Precision and Recall. Precision tells us how many flagged contents are truly harmful. Recall tells us how many harmful contents were caught. High recall is important to catch all bad content, but high precision avoids wrongly blocking good content. Balancing both is critical.

Confusion matrix for Content filtering
      | Predicted Harmful | Predicted Safe |
      |-------------------|----------------|
      | True Positive (TP) | False Positive (FP) |
      | False Negative (FN)| True Negative (TN)  |

      Example:
      TP = 80 (harmful content caught)
      FP = 20 (safe content wrongly blocked)
      TN = 900 (safe content allowed)
      FN = 10 (harmful content missed)

      Total samples = 80 + 20 + 900 + 10 = 1010
    
Precision vs Recall tradeoff with examples

If the model blocks too much content (high recall), it may block good posts (low precision). This annoys users. If it blocks too little (high precision), harmful content slips through (low recall). For example, a social media platform wants to catch all hate speech (high recall) but also avoid blocking normal posts (high precision). The right balance depends on the platform's goals.

Good vs Bad metric values for Content filtering

Good: Precision around 0.9 and Recall around 0.85 means most harmful content is caught and few good posts are blocked.

Bad: Precision 0.5 and Recall 0.95 means many good posts are wrongly blocked. Or Precision 0.95 and Recall 0.4 means many harmful posts are missed.

Common pitfalls in Content filtering metrics
  • Accuracy paradox: If harmful content is rare, a model that always predicts safe can have high accuracy but is useless.
  • Data leakage: If test data leaks info from training, metrics look better than real.
  • Overfitting: Very high training metrics but poor real-world performance.
Self-check question

Your content filter model has 98% accuracy but only 12% recall on harmful content. Is it good for production? Why or why not?

Answer: No, it is not good. The model misses 88% of harmful content (low recall), which is dangerous. High accuracy is misleading because harmful content is rare. Improving recall is critical.

Key Result
Precision and recall balance is key to effective content filtering; high recall avoids missing harmful content, high precision avoids blocking good content.

Practice

(1/5)
1. What is the main purpose of content filtering in AI systems?
easy
A. To block or clean harmful text to keep users safe
B. To speed up the AI model training process
C. To increase the size of the training dataset
D. To improve the AI model's accuracy on images

Solution

  1. Step 1: Understand content filtering purpose

    Content filtering is designed to detect and remove harmful or unsafe text to protect users.
  2. Step 2: Compare options to purpose

    Only To block or clean harmful text to keep users safe matches this goal; others relate to unrelated AI tasks.
  3. Final Answer:

    To block or clean harmful text to keep users safe -> Option A
  4. Quick Check:

    Content filtering = block harmful text [OK]
Hint: Content filtering = blocking harmful or unsafe text [OK]
Common Mistakes:
  • Confusing filtering with training speed
  • Thinking filtering improves image accuracy
  • Assuming filtering increases data size
2. Which of the following is a correct way to check if a text contains a banned word in Python?
easy
A. if text.has(banned_word):
B. if text.contains(banned_word):
C. if banned_word in text:
D. if banned_word inside text:

Solution

  1. Step 1: Recall Python syntax for substring check

    In Python, the correct way to check if a substring is in a string is using in.
  2. Step 2: Evaluate each option

    if banned_word in text: uses correct syntax; others use invalid or non-Python methods.
  3. Final Answer:

    if banned_word in text: -> Option C
  4. Quick Check:

    Substring check in Python uses 'in' keyword [OK]
Hint: Use 'in' keyword to check substring in Python strings [OK]
Common Mistakes:
  • Using non-existent methods like contains()
  • Using wrong keywords like 'inside'
  • Confusing syntax from other languages
3. Given the code below, what will be the output?
bad_words = ['spam', 'scam']
text = 'This message contains spam and scam.'
filtered = any(word in text for word in bad_words)
print(filtered)
medium
A. None
B. False
C. Error
D. True

Solution

  1. Step 1: Understand the any() function with generator

    The expression checks if any bad word is found in the text. Since 'spam' and 'scam' are both in the text, any() returns True.
  2. Step 2: Confirm print output

    Printing filtered will output True because the condition is met.
  3. Final Answer:

    True -> Option D
  4. Quick Check:

    any() finds bad words = True [OK]
Hint: any() returns True if any bad word is found in text [OK]
Common Mistakes:
  • Thinking any() returns False if multiple matches
  • Confusing any() with all()
  • Expecting an error due to syntax
4. Identify the error in this content filtering code snippet:
bad_words = ['bad', 'ugly']
text = 'This is a bad example.'
if bad_words in text:
    print('Filtered')
else:
    print('Clean')
medium
A. Using 'in' to check list in string is incorrect
B. Missing colon after if statement
C. bad_words should be a string, not a list
D. print statement syntax is wrong

Solution

  1. Step 1: Analyze the 'if' condition

    The code tries to check if a list is in a string, which is invalid in Python.
  2. Step 2: Correct way to check bad words in text

    We should check each word individually, e.g., using any(word in text for word in bad_words).
  3. Final Answer:

    Using 'in' to check list in string is incorrect -> Option A
  4. Quick Check:

    Cannot check list in string directly [OK]
Hint: Check each word, not whole list, when filtering text [OK]
Common Mistakes:
  • Trying to use 'in' with list and string directly
  • Ignoring need for loop or any()
  • Assuming list membership works on strings
5. You want to replace all banned words in a user message with '[CENSORED]'. Which code snippet correctly does this for the list banned = ['bad', 'ugly'] and string msg = 'This is a bad and ugly day.'?
hard
A. msg = msg.replace(banned, '[CENSORED]') print(msg)
B. for word in banned: msg = msg.replace(word, '[CENSORED]') print(msg)
C. msg = '[CENSORED]' if word in banned else msg print(msg)
D. msg = msg.filter(lambda w: w not in banned) print(msg)

Solution

  1. Step 1: Understand string replacement for multiple words

    We must replace each banned word one by one using a loop and str.replace().
  2. Step 2: Evaluate each option

    for word in banned: msg = msg.replace(word, '[CENSORED]') print(msg) correctly loops and replaces; B tries to replace list directly (invalid); C uses wrong syntax; D uses filter on string (invalid).
  3. Final Answer:

    for word in banned: msg = msg.replace(word, '[CENSORED]') print(msg) -> Option B
  4. Quick Check:

    Loop and replace each banned word [OK]
Hint: Replace banned words one by one with a loop and replace() [OK]
Common Mistakes:
  • Trying to replace list directly in string
  • Using filter on string instead of list
  • Incorrect conditional replacement syntax