0
0
Prompt Engineering / GenAIml~20 mins

Automated evaluation metrics in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Automated Metrics Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding BLEU score purpose

What is the main purpose of the BLEU score in evaluating machine learning models?

ATo measure the similarity between generated text and reference text by comparing overlapping n-grams
BTo calculate the accuracy of classification models by counting correct predictions
CTo evaluate the speed of model training in seconds
DTo measure the memory usage of a model during inference
Attempts:
2 left
💡 Hint

Think about how text generation quality is measured by comparing outputs.

Predict Output
intermediate
2:00remaining
Output of accuracy calculation code

What is the output of the following Python code that calculates accuracy?

Prompt Engineering / GenAI
true_labels = [1, 0, 1, 1, 0]
predictions = [1, 0, 0, 1, 0]
correct = sum(t == p for t, p in zip(true_labels, predictions))
accuracy = correct / len(true_labels)
print(round(accuracy, 2))
A0.80
B0.60
C0.75
D0.50
Attempts:
2 left
💡 Hint

Count how many predictions match the true labels, then divide by total.

Model Choice
advanced
2:00remaining
Best metric for imbalanced classification

You have a dataset where one class is very rare. Which metric is best to evaluate your model's performance?

AAccuracy
BBLEU score
CMean Squared Error
DPrecision and Recall
Attempts:
2 left
💡 Hint

Think about metrics that focus on positive class detection in imbalanced data.

Hyperparameter
advanced
2:00remaining
Effect of changing threshold on F1 score

In a binary classifier, what happens to the F1 score if you increase the decision threshold too high?

AF1 score increases because precision and recall both improve
BF1 score becomes undefined
CF1 score decreases because recall drops while precision may increase
DF1 score stays the same regardless of threshold
Attempts:
2 left
💡 Hint

Think about how raising threshold affects false negatives and false positives.

🔧 Debug
expert
2:00remaining
Identify error in metric calculation code

What error does this code raise when calculating the mean squared error (MSE)?

Prompt Engineering / GenAI
import numpy as np
true = np.array([1, 2, 3])
pred = np.array([1, 2])
mse = np.mean((true - pred) ** 2)
print(mse)
ATypeError because numpy arrays cannot be subtracted
BValueError due to shape mismatch in subtraction
CZeroDivisionError when calculating mean
DNo error, outputs 0.5
Attempts:
2 left
💡 Hint

Check if the arrays have the same length before subtracting.