Bird
Raised Fist0
Agentic AIml~20 mins

Defining success criteria for agents in Agentic AI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Defining success criteria for agents
Problem:You have built an AI agent that performs tasks in a simulated environment. Currently, the agent's success is measured only by task completion, but this does not capture how well or efficiently the agent performs.
Current Metrics:Success rate: 75% (agent completes tasks), Average steps per task: 150
Issue:The agent completes many tasks but often takes too many steps, making it inefficient. The current success criteria do not reflect efficiency or quality of task completion.
Your Task
Define and implement improved success criteria that consider both task completion and efficiency, aiming for at least 80% success rate with average steps per task under 120.
You cannot change the agent's core decision-making code.
You can only modify how success is measured and reported.
You must keep the success criteria simple and interpretable.
Hint 1
Hint 2
Hint 3
Solution
Agentic AI
class AgentSuccessCriteria:
    def __init__(self, completion_weight=0.7, efficiency_weight=0.3, max_steps=120):
        self.completion_weight = completion_weight
        self.efficiency_weight = efficiency_weight
        self.max_steps = max_steps

    def compute_success_score(self, completed: bool, steps: int) -> float:
        completion_score = 1.0 if completed else 0.0
        efficiency_score = max(0.0, (self.max_steps - steps) / self.max_steps) if completed else 0.0
        success_score = (self.completion_weight * completion_score) + (self.efficiency_weight * efficiency_score)
        return success_score

# Example usage:
agent_results = [
    {'completed': True, 'steps': 70},
    {'completed': True, 'steps': 80},
    {'completed': True, 'steps': 85},
    {'completed': True, 'steps': 95}
]

criteria = AgentSuccessCriteria()
scores = [criteria.compute_success_score(r['completed'], r['steps']) for r in agent_results]
avg_score = sum(scores) / len(scores)
print(f"Average success score: {avg_score:.2f}")
Created a new class to define success criteria combining task completion and efficiency.
Added weights to balance importance of completion and efficiency.
Implemented a scoring function that returns a score between 0 and 1.
Demonstrated usage with example agent results.
Results Interpretation

Before: Success rate = 75%, Average steps = 150 (no efficiency considered)

After: Average success score = 0.79 (combines completion and efficiency)

Defining success criteria that combine multiple relevant factors helps better evaluate agent performance beyond simple task completion.
Bonus Experiment
Try adjusting the weights for completion and efficiency to see how the success score changes and find the best balance for your agent.
💡 Hint
Increase efficiency weight to reward faster task completion more, or increase completion weight to prioritize finishing tasks.

Practice

(1/5)
1. Why is it important to define success criteria for an AI agent?
easy
A. It reduces the size of the agent's code.
B. It helps the agent understand what goal to achieve.
C. It makes the agent run faster.
D. It allows the agent to ignore errors.

Solution

  1. Step 1: Understand the role of success criteria

    Success criteria tell the agent what outcome is desired or considered good.
  2. Step 2: Connect success criteria to agent behavior

    Without clear goals, the agent cannot know what to aim for or when it has succeeded.
  3. Final Answer:

    It helps the agent understand what goal to achieve. -> Option B
  4. Quick Check:

    Success criteria = clear goals [OK]
Hint: Success criteria define the agent's goal clearly [OK]
Common Mistakes:
  • Thinking success criteria speed up the agent
  • Confusing success criteria with code size
  • Believing success criteria ignore errors
2. Which of the following is the correct way to express a success criterion for an agent in code?
easy
A. success == accuracy > 0.9
B. success = accuracy = 0.9
C. success = accuracy > 0.9
D. success => accuracy > 0.9

Solution

  1. Step 1: Identify correct comparison syntax

    In Python, to assign a boolean result, use a single = with a comparison expression on the right.
  2. Step 2: Check each option's syntax

    success = accuracy > 0.9 uses correct assignment and comparison. success = accuracy = 0.9 uses = instead of == for comparison. success == accuracy > 0.9 uses == incorrectly for assignment. success => accuracy > 0.9 uses => which is invalid in Python.
  3. Final Answer:

    success = accuracy > 0.9 -> Option C
  4. Quick Check:

    Assignment with comparison uses = and > [OK]
Hint: Use '=' for assignment, '>' for comparison [OK]
Common Mistakes:
  • Using '==' instead of '=' for assignment
  • Using '=' instead of '==' for comparison
  • Using invalid operators like '=>'
3. Given the code below, what will be the value of success?
accuracy = 0.85
threshold = 0.8
success = accuracy >= threshold
medium
A. True
B. Error
C. 0.85
D. False

Solution

  1. Step 1: Compare accuracy and threshold values

    Accuracy is 0.85, threshold is 0.8, so 0.85 >= 0.8 is True.
  2. Step 2: Assign comparison result to success

    The boolean True is assigned to success.
  3. Final Answer:

    True -> Option A
  4. Quick Check:

    0.85 >= 0.8 = True [OK]
Hint: Check if accuracy meets or exceeds threshold [OK]
Common Mistakes:
  • Confusing value 0.85 with boolean True
  • Thinking comparison returns a number
  • Expecting an error from valid comparison
4. The following code is intended to check if an agent's success metric is above 90%, but it has a bug. What is the bug?
success_metric = 0.92
if success_metric = 0.9:
    print('Agent succeeded')
medium
A. Missing colon ':' after if statement
B. Print statement syntax error
C. Incorrect variable name 'success_metric'
D. Using '=' instead of '==' in the if condition

Solution

  1. Step 1: Identify the if statement syntax

    In Python, '=' is for assignment, '==' is for comparison in conditions.
  2. Step 2: Locate the bug in the if condition

    The code uses '=' instead of '==' which causes a syntax error.
  3. Final Answer:

    Using '=' instead of '==' in the if condition -> Option D
  4. Quick Check:

    Use '==' for comparison in if [OK]
Hint: Use '==' to compare values in if statements [OK]
Common Mistakes:
  • Confusing '=' with '==' in conditions
  • Ignoring syntax errors from wrong operators
  • Assuming missing colon is the error
5. You want to define success criteria for an agent that completes tasks with at least 95% accuracy and finishes within 10 seconds. Which of the following is the best way to define this success criteria in code?
hard
A. success = (accuracy >= 0.95) and (time_taken <= 10)
B. success = accuracy > 0.95 or time_taken < 10
C. success = accuracy == 0.95 and time_taken == 10
D. success = accuracy >= 0.95 and time_taken > 10

Solution

  1. Step 1: Understand the criteria requirements

    The agent must have accuracy at least 95% and finish within 10 seconds.
  2. Step 2: Translate criteria into logical conditions

    Use '>=' for accuracy and '<=' for time, combined with 'and' to require both.
  3. Step 3: Evaluate each option

    success = (accuracy >= 0.95) and (time_taken <= 10) correctly uses 'and' and proper comparisons. success = accuracy > 0.95 or time_taken < 10 uses 'or' which allows passing if only one condition is met. success = accuracy == 0.95 and time_taken == 10 uses '==' which is too strict. success = accuracy >= 0.95 and time_taken > 10 allows time_taken > 10 which breaks the time limit.
  4. Final Answer:

    success = (accuracy >= 0.95) and (time_taken <= 10) -> Option A
  5. Quick Check:

    Both accuracy and time must meet thresholds [OK]
Hint: Use 'and' to combine all success conditions [OK]
Common Mistakes:
  • Using 'or' instead of 'and' to combine conditions
  • Using '==' instead of '>=' or '<='
  • Allowing time greater than limit