0
0
Agentic_aiml~20 mins

Why guardrails prevent agent disasters in Agentic Ai - Experiment to Prove It

Choose your learning style8 modes available
Experiment - Why guardrails prevent agent disasters
Problem:You have an AI agent designed to perform tasks autonomously. However, without safety guardrails, the agent sometimes takes harmful or unintended actions.
Current Metrics:Agent success rate: 85%, but 15% of actions cause unintended harmful side effects.
Issue:The agent performs well on tasks but occasionally causes disasters due to lack of constraints or safety checks.
Your Task
Add guardrails to the agent to reduce harmful side effects from 15% to below 5%, while maintaining at least 85% task success rate.
You cannot reduce the agent's task capabilities.
You must keep the agent's response time within 10% of the original.
Hint 1
Hint 2
Hint 3
Solution
Agentic_ai
import random

class Agent:
    def __init__(self):
        self.task_success_rate = 0.85
        self.harmful_action_rate = 0.15

    def act(self):
        # Simulate action with chance of harm
        if random.random() < self.harmful_action_rate:
            return 'harmful_action'
        else:
            return 'safe_action'

class GuardrailAgent(Agent):
    def __init__(self):
        super().__init__()
        self.harmful_action_rate = 0.15

    def safety_check(self, action):
        # Simple guardrail: block harmful actions
        if action == 'harmful_action':
            return 'blocked_action'
        return action

    def act(self):
        action = super().act()
        safe_action = self.safety_check(action)
        return safe_action

# Evaluate agent before guardrails
agent = Agent()
trials = 1000
harmful_count = 0
success_count = 0
for _ in range(trials):
    action = agent.act()
    if action == 'harmful_action':
        harmful_count += 1
    else:
        success_count += 1

# Evaluate agent after guardrails
guardrail_agent = GuardrailAgent()
harmful_count_gr = 0
success_count_gr = 0
blocked_count = 0
for _ in range(trials):
    action = guardrail_agent.act()
    if action == 'harmful_action':
        harmful_count_gr += 1
    elif action == 'blocked_action':
        blocked_count += 1
    else:
        success_count_gr += 1

print(f"Before guardrails: Success rate = {success_count/trials*100:.1f}%, Harmful actions = {harmful_count/trials*100:.1f}%")
print(f"After guardrails: Success rate = {success_count_gr/trials*100:.1f}%, Harmful actions = {harmful_count_gr/trials*100:.1f}%, Blocked actions = {blocked_count/trials*100:.1f}%")
Added a safety_check method to block harmful actions before execution.
Modified the act method to apply safety_check and prevent harmful actions.
Kept task success rate high by only blocking harmful actions, not safe ones.
Results Interpretation

Before guardrails: Success rate 85%, Harmful actions 15%

After guardrails: Success rate 85%, Harmful actions 0%, Blocked actions 15%

Adding guardrails prevents harmful actions without reducing the agent's ability to complete tasks. This shows how safety checks help avoid disasters while keeping performance.
Bonus Experiment
Try adding a penalty in the agent's learning process for harmful actions instead of blocking them outright.
💡 Hint
Modify the reward function to reduce rewards when harmful actions occur, encouraging the agent to learn safer behavior.