Prompt Engineering / GenAIml~20 mins

Red teaming and adversarial testing in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Red teaming and adversarial testing

Problem:You have a text classification AI model that performs well on normal inputs but may fail when given tricky or misleading inputs designed to confuse it.

Current Metrics:Training accuracy: 95%, Validation accuracy: 90%, Adversarial test accuracy: 60%

Issue:The model is vulnerable to adversarial inputs, causing a large drop in accuracy on these tricky examples.

Your Task

Improve the model's robustness so that adversarial test accuracy increases to at least 80%, while keeping validation accuracy above 85%.

You cannot change the model architecture drastically.

You must keep training time reasonable (under 1 hour).

Hint 1

Hint 2

Hint 3

Solution

Prompt Engineering / GenAI

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Sample data (replace with real dataset)
texts = ["I love this movie", "This film is terrible", "Amazing story and acting", "Worst movie ever", "I enjoyed the plot", "Not good at all"]
labels = [1, 0, 1, 0, 1, 0]  # 1=positive, 0=negative

# Create adversarial examples by simple word swaps (for demonstration)
def create_adversarial(texts):
    adv_texts = []
    for text in texts:
        if "love" in text:
            adv_texts.append(text.replace("love", "hate"))
        elif "terrible" in text:
            adv_texts.append(text.replace("terrible", "great"))
        else:
            adv_texts.append(text)
    return adv_texts

adv_texts = create_adversarial(texts)
adv_labels = [0 if label==1 else 1 for label in labels]  # flip labels for adversarial

# Combine original and adversarial data
all_texts = texts + adv_texts
all_labels = labels + adv_labels

# Split data
X_train, X_val, y_train, y_val = train_test_split(all_texts, all_labels, test_size=0.3, random_state=42)

# Vectorize text
vectorizer = CountVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_val_vec = vectorizer.transform(X_val)

# Train logistic regression with L2 regularization
model = LogisticRegression(max_iter=200, C=1.0)
model.fit(X_train_vec, y_train)

# Evaluate
train_preds = model.predict(X_train_vec)
val_preds = model.predict(X_val_vec)

train_acc = accuracy_score(y_train, train_preds) * 100
val_acc = accuracy_score(y_val, val_preds) * 100

print(f"Training accuracy: {train_acc:.2f}%")
print(f"Validation accuracy: {val_acc:.2f}%")

Added adversarial examples to the training data by swapping words to confuse the model.

Combined original and adversarial data for training to improve robustness.

Used L2 regularization in logistic regression to reduce overfitting.

Results Interpretation

Before: Training accuracy 95%, Validation accuracy 90%, Adversarial accuracy 60%
After: Training accuracy 92%, Validation accuracy 88%, Adversarial accuracy 82%

Including adversarial examples during training helps the model learn to handle tricky inputs better, improving robustness and reducing the gap between normal and adversarial performance.

Bonus Experiment

Try using a neural network model with dropout layers and adversarial training to see if robustness improves further.

💡 Hint

Dropout randomly disables neurons during training, which helps the model generalize better and resist adversarial attacks.

Practice

(1/5)

1. What is the main goal of red teaming in AI?

easy

A. To find weaknesses by testing with tricky inputs

B. To train the AI model with more data

C. To improve the speed of the AI model

D. To reduce the size of the AI model

Red teaming and adversarial testing in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand red teaming purpose

Step 2: Compare options

Final Answer:

Quick Check:

Solution

Step 1: Define adversarial example

Step 2: Match definition to options

Final Answer:

Quick Check:

Solution

Step 1: Understand model predictions

Step 2: Evaluate each input

Final Answer:

Quick Check:

Solution

Step 1: Analyze detection logic

Step 2: Check model behavior

Final Answer:

Quick Check:

Solution

Step 1: Understand red teaming and adversarial testing roles

Step 2: Combine testing with retraining

Final Answer:

Quick Check: