0
0
Agentic_aiml~5 mins

Output filtering and safety checks in Agentic Ai

Choose your learning style8 modes available
Introduction

Output filtering and safety checks help make sure AI answers are safe and useful. They stop bad or harmful results before you see them.

When building a chatbot that talks to people and must avoid rude or unsafe replies.
When creating AI that writes content and you want to prevent wrong or harmful information.
When using AI to help with decisions and you need to check answers for safety.
When sharing AI results publicly and want to avoid offensive or biased outputs.
When training AI models that generate text, images, or code and want to keep outputs clean.
Syntax
Agentic_ai
def filter_output(output: str) -> bool:
    # Return True if output is safe, False if not
    pass

# Example usage
result = model.generate(input_data)
if filter_output(result):
    print(result)
else:
    print("Output blocked for safety.")

The filter_output function checks if the AI output is safe.

You can customize filters for bad words, harmful content, or wrong info.

Examples
This simple filter blocks outputs containing certain bad words.
Agentic_ai
def filter_output(output: str) -> bool:
    bad_words = ['badword1', 'badword2']
    return not any(word in output.lower() for word in bad_words)
This filter blocks outputs that are too long.
Agentic_ai
def filter_output(output: str) -> bool:
    if len(output) > 500:
        return False
    return True
This filter blocks outputs with unsafe phrases.
Agentic_ai
def filter_output(output: str) -> bool:
    # Check for unsafe phrases
    unsafe_phrases = ['do harm', 'illegal']
    for phrase in unsafe_phrases:
        if phrase in output.lower():
            return False
    return True
Sample Program

This program simulates AI output and uses a filter to block unsafe text. It prints safe outputs or blocks unsafe ones.

Agentic_ai
def filter_output(output: str) -> bool:
    bad_words = ['hate', 'kill']
    for word in bad_words:
        if word in output.lower():
            return False
    return True

class SimpleModel:
    def generate(self, input_text):
        # Pretend this is AI output
        if 'hello' in input_text.lower():
            return "Hello! How can I help you today?"
        else:
            return "I hate bad things."

model = SimpleModel()

inputs = ["Hello there", "Tell me something bad"]

for text in inputs:
    output = model.generate(text)
    if filter_output(output):
        print(f"Safe output: {output}")
    else:
        print("Output blocked for safety.")
OutputSuccess
Important Notes

Filters can be simple word checks or complex AI models themselves.

Always test filters carefully to avoid blocking good outputs or allowing bad ones.

Output filtering helps keep AI safe and trustworthy for users.

Summary

Output filtering stops unsafe or unwanted AI results.

Filters can check for bad words, unsafe phrases, or length limits.

Using filters makes AI safer and more reliable for real users.