0
0
Agentic AIml~20 mins

Regression testing for agent changes in Agentic AI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Regression testing for agent changes
Problem:You have an AI agent that performs tasks based on learned behavior. After updating the agent's code or model, you want to ensure it still performs well on previous tasks without losing accuracy.
Current Metrics:Previous agent accuracy on test tasks: 92%. After update, accuracy on same tasks dropped to 78%.
Issue:The agent update caused regression: performance on old tasks decreased significantly.
Your Task
Create a regression testing process that detects performance drops on old tasks after agent updates. Aim to keep accuracy on old tasks above 90% after changes.
You cannot remove or reduce the original test tasks.
You must automate the testing process for quick checks after each update.
Hint 1
Hint 2
Hint 3
Solution
Agentic AI
import numpy as np
from sklearn.metrics import accuracy_score

# Simulated old test data and labels
X_test = np.array([[0,1],[1,0],[1,1],[0,0]])
y_test = np.array([1,1,0,0])  # Expected labels

# Baseline agent predictions before update
baseline_predictions = np.array([1,1,0,0])

# Updated agent predictions (simulated drop in accuracy)
updated_predictions = np.array([1,0,0,0])

# Function to run regression test
def regression_test(baseline_preds, new_preds, true_labels):
    baseline_acc = accuracy_score(true_labels, baseline_preds) * 100
    new_acc = accuracy_score(true_labels, new_preds) * 100
    print(f"Baseline accuracy: {baseline_acc:.1f}%")
    print(f"New agent accuracy: {new_acc:.1f}%")
    if new_acc < baseline_acc - 5:
        print("Warning: Regression detected! Performance dropped significantly.")
    else:
        print("No significant regression detected.")

# Run regression test
regression_test(baseline_predictions, updated_predictions, y_test)
Created a fixed test dataset representing old tasks.
Stored baseline predictions from the previous agent version.
Compared new agent predictions to baseline using accuracy metric.
Added automated check to warn if accuracy drops more than 5%.
Results Interpretation

Before update: Accuracy was 100% on old tasks.

After update: Accuracy dropped to 75%, showing regression.

Regression testing helps catch when updates harm previous performance. Keeping a fixed test set and comparing results ensures agent reliability over time.
Bonus Experiment
Extend the regression test to include multiple metrics like precision and recall for more detailed performance checks.
💡 Hint
Use sklearn.metrics precision_score and recall_score functions alongside accuracy_score.