Practice

(1/5)

1. What is the main goal of red teaming in AI?

easy

A. To find weaknesses by testing with tricky inputs

B. To train the AI model with more data

C. To improve the speed of the AI model

D. To reduce the size of the AI model

Solution

Step 1: Understand red teaming purpose
Red teaming is about testing AI models with challenging inputs to find weaknesses.
Step 2: Compare options
Only To find weaknesses by testing with tricky inputs matches this goal; others relate to training, speed, or size, which are unrelated.
Final Answer:
To find weaknesses by testing with tricky inputs -> Option A
Quick Check:
Red teaming = find weaknesses [OK]

Hint: Red teaming means testing for weaknesses with tricky inputs [OK]

Common Mistakes:

Confusing red teaming with training
Thinking it improves speed or size
Assuming it fixes bugs automatically

2. Which of the following is the correct way to describe an adversarial example?

easy

A. A normal input that the model handles well

B. A training example used to improve accuracy

C. A random input unrelated to the task

D. An input designed to confuse the AI model

Solution

Step 1: Define adversarial example
An adversarial example is a carefully crafted input meant to confuse or trick the AI model.
Step 2: Match definition to options
An input designed to confuse the AI model matches this exactly; others describe normal, random, or training inputs.
Final Answer:
An input designed to confuse the AI model -> Option D
Quick Check:
Adversarial example = tricky input [OK]

Hint: Adversarial examples are tricky inputs to confuse AI [OK]

Common Mistakes:

Thinking adversarial means normal or random input
Confusing training data with adversarial examples
Assuming adversarial examples improve model accuracy

3. Consider this Python code snippet for adversarial testing:

def test_model(model, inputs):
    results = []
    for inp in inputs:
        pred = model.predict(inp)
        if pred == 'safe':
            results.append(True)
        else:
            results.append(False)
    return results

inputs = ['normal', 'tricky', 'normal']
class DummyModel:
    def predict(self, x):
        return 'safe' if x == 'normal' else 'unsafe'

model = DummyModel()
print(test_model(model, inputs))

What is the output?

medium

A. [False, True, False]

B. [True, True, True]

C. [True, False, True]

D. [False, False, False]

Solution

Step 1: Understand model predictions
The DummyModel returns 'safe' for 'normal' inputs and 'unsafe' for others.
Step 2: Evaluate each input
Inputs are ['normal', 'tricky', 'normal']. Predictions: 'safe', 'unsafe', 'safe'. Results: True, False, True.
Final Answer:
[True, False, True] -> Option C
Quick Check:
Predictions match results [OK]

Hint: Check each input prediction carefully [OK]

Common Mistakes:

Mixing up 'safe' and 'unsafe' outputs
Assuming all inputs are safe
Ignoring the else condition

4. This code tries to detect adversarial inputs but has a bug:

def detect_adversarial(inputs, model):
    flagged = []
    for i in inputs:
        if model.predict(i) == 'safe':
            flagged.append(i)
    return flagged

class Model:
    def predict(self, x):
        return 'unsafe' if x == 'tricky' else 'safe'

inputs = ['normal', 'tricky', 'normal']
print(detect_adversarial(inputs, Model()))

What is the bug?

medium

A. The model.predict method is missing

B. It flags safe inputs instead of unsafe ones

C. The inputs list is empty

D. The function returns a boolean instead of a list

Solution

Step 1: Analyze detection logic
The function flags inputs where model.predict returns 'safe'.
Step 2: Check model behavior
Model returns 'unsafe' for 'tricky', 'safe' otherwise. So safe inputs are flagged, which is wrong.
Final Answer:
It flags safe inputs instead of unsafe ones -> Option B
Quick Check:
Flagging logic reversed [OK]

Hint: Check if flagged inputs match unsafe cases [OK]

Common Mistakes:

Assuming model.predict is missing
Thinking inputs list is empty
Confusing return types

5. You want to improve an AI chatbot's safety by using red teaming and adversarial testing. Which combined approach is best?

hard

A. Use tricky inputs to find weaknesses, then retrain with those examples

B. Ignore tricky inputs and focus on normal conversation data

C. Only test with random inputs and fix errors found

D. Reduce model size to avoid complex errors

Solution

Step 1: Understand red teaming and adversarial testing roles
They find weaknesses by using tricky inputs to test the model.
Step 2: Combine testing with retraining
After finding weaknesses, retraining with those examples improves safety and reliability.
Final Answer:
Use tricky inputs to find weaknesses, then retrain with those examples -> Option A
Quick Check:
Test + retrain = better safety [OK]

Hint: Test with tricky inputs, then retrain to fix weaknesses [OK]

Common Mistakes:

Only testing without retraining
Ignoring tricky inputs
Thinking smaller models fix safety

Epoch	Loss ↓	Accuracy ↑	Observation
1	1.2	0.55	Model starts learning but struggles with adversarial examples
2	0.9	0.65	Loss decreases, accuracy improves as model adapts
3	0.7	0.75	Better handling of adversarial inputs
4	0.5	0.82	Model robustness improves
5	0.4	0.85	Training converges with good accuracy and robustness

Red teaming and adversarial testing in Prompt Engineering / GenAI - Model Pipeline Trace

Start learning this pattern below

Practice

Solution

Step 1: Understand red teaming purpose

Step 2: Compare options

Final Answer:

Quick Check:

Solution

Step 1: Define adversarial example

Step 2: Match definition to options

Final Answer:

Quick Check:

Solution

Step 1: Understand model predictions

Step 2: Evaluate each input

Final Answer:

Quick Check:

Solution

Step 1: Analyze detection logic

Step 2: Check model behavior

Final Answer:

Quick Check:

Solution

Step 1: Understand red teaming and adversarial testing roles

Step 2: Combine testing with retraining

Final Answer:

Quick Check: