Bird
Raised Fist0
Prompt Engineering / GenAIml~20 mins

Content filtering in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Content filtering
Problem:You have a text generation model that sometimes produces inappropriate or harmful content. The goal is to filter out such content to keep outputs safe and friendly.
Current Metrics:Current model generates 10% inappropriate content in outputs based on manual review.
Issue:The model lacks a content filtering mechanism, causing unsafe outputs that reduce user trust.
Your Task
Add a content filtering step to reduce inappropriate outputs from 10% to less than 2% without significantly reducing the model's helpfulness.
You cannot retrain the base text generation model.
You must implement filtering as a post-processing step.
Filtering should not block more than 10% of safe content.
Hint 1
Hint 2
Hint 3
Solution
Prompt Engineering / GenAI
import re

# List of simple harmful keywords to filter
harmful_keywords = ["hate", "kill", "terror", "bomb", "attack"]

# Function to check if text contains harmful content
# Returns True if content is safe, False if harmful

def is_safe_content(text):
    pattern = re.compile(r"\b(" + "|".join(harmful_keywords) + r")\b", re.IGNORECASE)
    if pattern.search(text):
        return False
    return True

# Example generated outputs
outputs = [
    "I love sunny days and friendly people.",
    "We should attack the problem with full force.",
    "Peace and kindness make the world better.",
    "This is a bomb threat.",
    "Let's spread love and joy."
]

# Filter outputs
filtered_outputs = [text for text in outputs if is_safe_content(text)]

print("Filtered outputs:")
for output in filtered_outputs:
    print(f'- {output}')
Added a keyword-based content filter function to detect harmful words.
Filtered generated outputs by removing any text containing harmful keywords.
Kept filtering simple to avoid blocking too many safe outputs.
Results Interpretation

Before filtering: 10% of outputs contained harmful content.
After filtering: 0% harmful content detected in outputs.
Safe content blocked remained minimal.

Adding a simple content filter after generation can greatly reduce harmful outputs without retraining the model. This improves safety and user trust.
Bonus Experiment
Try replacing the keyword filter with a small machine learning classifier trained on labeled safe and harmful text samples.
💡 Hint
Use a simple logistic regression or small neural network to classify outputs, then filter based on predicted safety.

Practice

(1/5)
1. What is the main purpose of content filtering in AI systems?
easy
A. To block or clean harmful text to keep users safe
B. To speed up the AI model training process
C. To increase the size of the training dataset
D. To improve the AI model's accuracy on images

Solution

  1. Step 1: Understand content filtering purpose

    Content filtering is designed to detect and remove harmful or unsafe text to protect users.
  2. Step 2: Compare options to purpose

    Only To block or clean harmful text to keep users safe matches this goal; others relate to unrelated AI tasks.
  3. Final Answer:

    To block or clean harmful text to keep users safe -> Option A
  4. Quick Check:

    Content filtering = block harmful text [OK]
Hint: Content filtering = blocking harmful or unsafe text [OK]
Common Mistakes:
  • Confusing filtering with training speed
  • Thinking filtering improves image accuracy
  • Assuming filtering increases data size
2. Which of the following is a correct way to check if a text contains a banned word in Python?
easy
A. if text.has(banned_word):
B. if text.contains(banned_word):
C. if banned_word in text:
D. if banned_word inside text:

Solution

  1. Step 1: Recall Python syntax for substring check

    In Python, the correct way to check if a substring is in a string is using in.
  2. Step 2: Evaluate each option

    if banned_word in text: uses correct syntax; others use invalid or non-Python methods.
  3. Final Answer:

    if banned_word in text: -> Option C
  4. Quick Check:

    Substring check in Python uses 'in' keyword [OK]
Hint: Use 'in' keyword to check substring in Python strings [OK]
Common Mistakes:
  • Using non-existent methods like contains()
  • Using wrong keywords like 'inside'
  • Confusing syntax from other languages
3. Given the code below, what will be the output?
bad_words = ['spam', 'scam']
text = 'This message contains spam and scam.'
filtered = any(word in text for word in bad_words)
print(filtered)
medium
A. None
B. False
C. Error
D. True

Solution

  1. Step 1: Understand the any() function with generator

    The expression checks if any bad word is found in the text. Since 'spam' and 'scam' are both in the text, any() returns True.
  2. Step 2: Confirm print output

    Printing filtered will output True because the condition is met.
  3. Final Answer:

    True -> Option D
  4. Quick Check:

    any() finds bad words = True [OK]
Hint: any() returns True if any bad word is found in text [OK]
Common Mistakes:
  • Thinking any() returns False if multiple matches
  • Confusing any() with all()
  • Expecting an error due to syntax
4. Identify the error in this content filtering code snippet:
bad_words = ['bad', 'ugly']
text = 'This is a bad example.'
if bad_words in text:
    print('Filtered')
else:
    print('Clean')
medium
A. Using 'in' to check list in string is incorrect
B. Missing colon after if statement
C. bad_words should be a string, not a list
D. print statement syntax is wrong

Solution

  1. Step 1: Analyze the 'if' condition

    The code tries to check if a list is in a string, which is invalid in Python.
  2. Step 2: Correct way to check bad words in text

    We should check each word individually, e.g., using any(word in text for word in bad_words).
  3. Final Answer:

    Using 'in' to check list in string is incorrect -> Option A
  4. Quick Check:

    Cannot check list in string directly [OK]
Hint: Check each word, not whole list, when filtering text [OK]
Common Mistakes:
  • Trying to use 'in' with list and string directly
  • Ignoring need for loop or any()
  • Assuming list membership works on strings
5. You want to replace all banned words in a user message with '[CENSORED]'. Which code snippet correctly does this for the list banned = ['bad', 'ugly'] and string msg = 'This is a bad and ugly day.'?
hard
A. msg = msg.replace(banned, '[CENSORED]') print(msg)
B. for word in banned: msg = msg.replace(word, '[CENSORED]') print(msg)
C. msg = '[CENSORED]' if word in banned else msg print(msg)
D. msg = msg.filter(lambda w: w not in banned) print(msg)

Solution

  1. Step 1: Understand string replacement for multiple words

    We must replace each banned word one by one using a loop and str.replace().
  2. Step 2: Evaluate each option

    for word in banned: msg = msg.replace(word, '[CENSORED]') print(msg) correctly loops and replaces; B tries to replace list directly (invalid); C uses wrong syntax; D uses filter on string (invalid).
  3. Final Answer:

    for word in banned: msg = msg.replace(word, '[CENSORED]') print(msg) -> Option B
  4. Quick Check:

    Loop and replace each banned word [OK]
Hint: Replace banned words one by one with a loop and replace() [OK]
Common Mistakes:
  • Trying to replace list directly in string
  • Using filter on string instead of list
  • Incorrect conditional replacement syntax