Bird
Raised Fist0
Agentic AIml~20 mins

Regression testing for agent changes in Agentic AI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Regression testing for agent changes
Problem:You have an AI agent that performs tasks based on learned behavior. After updating the agent's code or model, you want to ensure it still performs well on previous tasks without losing accuracy.
Current Metrics:Previous agent accuracy on test tasks: 92%. After update, accuracy on same tasks dropped to 78%.
Issue:The agent update caused regression: performance on old tasks decreased significantly.
Your Task
Create a regression testing process that detects performance drops on old tasks after agent updates. Aim to keep accuracy on old tasks above 90% after changes.
You cannot remove or reduce the original test tasks.
You must automate the testing process for quick checks after each update.
Hint 1
Hint 2
Hint 3
Solution
Agentic AI
import numpy as np
from sklearn.metrics import accuracy_score

# Simulated old test data and labels
X_test = np.array([[0,1],[1,0],[1,1],[0,0]])
y_test = np.array([1,1,0,0])  # Expected labels

# Baseline agent predictions before update
baseline_predictions = np.array([1,1,0,0])

# Updated agent predictions (simulated drop in accuracy)
updated_predictions = np.array([1,0,0,0])

# Function to run regression test
def regression_test(baseline_preds, new_preds, true_labels):
    baseline_acc = accuracy_score(true_labels, baseline_preds) * 100
    new_acc = accuracy_score(true_labels, new_preds) * 100
    print(f"Baseline accuracy: {baseline_acc:.1f}%")
    print(f"New agent accuracy: {new_acc:.1f}%")
    if new_acc < baseline_acc - 5:
        print("Warning: Regression detected! Performance dropped significantly.")
    else:
        print("No significant regression detected.")

# Run regression test
regression_test(baseline_predictions, updated_predictions, y_test)
Created a fixed test dataset representing old tasks.
Stored baseline predictions from the previous agent version.
Compared new agent predictions to baseline using accuracy metric.
Added automated check to warn if accuracy drops more than 5%.
Results Interpretation

Before update: Accuracy was 100% on old tasks.

After update: Accuracy dropped to 75%, showing regression.

Regression testing helps catch when updates harm previous performance. Keeping a fixed test set and comparing results ensures agent reliability over time.
Bonus Experiment
Extend the regression test to include multiple metrics like precision and recall for more detailed performance checks.
💡 Hint
Use sklearn.metrics precision_score and recall_score functions alongside accuracy_score.

Practice

(1/5)
1. What is the main purpose of regression testing for agent changes?
easy
A. To check if new changes break old agent behavior
B. To improve the agent's speed
C. To add new features to the agent
D. To change the agent's user interface

Solution

  1. Step 1: Understand regression testing goal

    Regression testing is done to ensure that recent changes do not break existing functionality.
  2. Step 2: Match purpose with options

    To check if new changes break old agent behavior clearly states checking if new changes break old behavior, which matches the goal.
  3. Final Answer:

    To check if new changes break old agent behavior -> Option A
  4. Quick Check:

    Regression testing = check old behavior intact [OK]
Hint: Regression testing checks old features still work after changes [OK]
Common Mistakes:
  • Thinking regression testing adds new features
  • Confusing regression testing with performance testing
  • Assuming regression testing changes UI
2. Which of the following is the correct way to define a test case for regression testing an agent in Python?
easy
A. def test_agent(): assert agent.run(input) == expected_output
B. test agent run input equals expected output
C. def test_agent: return agent.run(input) == expected_output
D. function test_agent() { return agent.run(input) == expected_output; }

Solution

  1. Step 1: Identify correct Python function syntax

    Python functions start with 'def', have parentheses, and a colon.
  2. Step 2: Check assertion usage

    def test_agent(): assert agent.run(input) == expected_output uses 'assert' correctly to compare output, matching Python test style.
  3. Final Answer:

    def test_agent(): assert agent.run(input) == expected_output -> Option A
  4. Quick Check:

    Python test function with assert = def test_agent(): assert agent.run(input) == expected_output [OK]
Hint: Python test functions start with def and use assert [OK]
Common Mistakes:
  • Missing parentheses or colon in function definition
  • Using non-Python syntax
  • Not using assert for test checks
3. Given the code below, what will be the output of the regression test?
class Agent:
    def run(self, x):
        return x * 2

def test_agent():
    agent = Agent()
    result = agent.run(3)
    assert result == 6
    print('Test passed')

test_agent()
medium
A. SyntaxError
B. Test passed
C. AssertionError
D. No output

Solution

  1. Step 1: Understand agent run method

    The method multiplies input by 2, so run(3) returns 6.
  2. Step 2: Check assertion and print

    The assertion checks if result == 6, which is true, so no error occurs and 'Test passed' prints.
  3. Final Answer:

    Test passed -> Option B
  4. Quick Check:

    3 * 2 = 6, assertion true, prints message [OK]
Hint: Check method output matches assertion to predict test result [OK]
Common Mistakes:
  • Assuming assertion fails without checking output
  • Confusing syntax errors with logic errors
  • Ignoring print statement after assertion
4. Identify the error in the following regression test code and select the fix:
def test_agent():
    agent = Agent()
    result = agent.run(5)
    if result = 10:
        print('Test passed')
    else:
        print('Test failed')
medium
A. Replace print with return statements
B. Add parentheses around the if condition
C. Change '=' to '==' in the if condition
D. Remove else block

Solution

  1. Step 1: Identify syntax error in if condition

    The single '=' is an assignment, not a comparison, causing a syntax error.
  2. Step 2: Correct the comparison operator

    Replace '=' with '==' to compare values properly in the if statement.
  3. Final Answer:

    Change '=' to '==' in the if condition -> Option C
  4. Quick Check:

    Use '==' for comparison in if statements [OK]
Hint: Use '==' to compare, '=' assigns values [OK]
Common Mistakes:
  • Using '=' instead of '==' in conditions
  • Adding unnecessary parentheses in Python if
  • Thinking print must be replaced with return
5. You updated your agent's decision logic. How should you design regression tests to ensure old behaviors remain correct while testing new features?
hard
A. Test randomly without expected outputs to save time
B. Only test new features since old ones worked before
C. Remove old tests to avoid conflicts with new logic
D. Create test cases for both old expected outputs and new expected outputs

Solution

  1. Step 1: Understand regression test purpose

    Regression tests verify that old behaviors still work after changes.
  2. Step 2: Design tests covering old and new behaviors

    Include test cases for old expected outputs and new expected outputs to check both.
  3. Final Answer:

    Create test cases for both old expected outputs and new expected outputs -> Option D
  4. Quick Check:

    Test old and new outputs to ensure full correctness [OK]
Hint: Test old and new cases to catch breaks early [OK]
Common Mistakes:
  • Ignoring old tests after updates
  • Deleting old tests to simplify
  • Skipping expected outputs in tests