Challenge - 5 Problems

🎖️

Automated Metrics Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Understanding BLEU score purpose

What is the main purpose of the BLEU score in evaluating machine learning models?

ATo measure the similarity between generated text and reference text by comparing overlapping n-grams

BTo calculate the accuracy of classification models by counting correct predictions

CTo evaluate the speed of model training in seconds

DTo measure the memory usage of a model during inference

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of accuracy calculation code

What is the output of the following Python code that calculates accuracy?

Prompt Engineering / GenAI

true_labels = [1, 0, 1, 1, 0]
predictions = [1, 0, 0, 1, 0]
correct = sum(t == p for t, p in zip(true_labels, predictions))
accuracy = correct / len(true_labels)
print(round(accuracy, 2))

A0.80

B0.60

C0.75

D0.50

Attempts:

2 left

❓ Model Choice

advanced

2:00remaining

Best metric for imbalanced classification

You have a dataset where one class is very rare. Which metric is best to evaluate your model's performance?

AAccuracy

BBLEU score

CMean Squared Error

DPrecision and Recall

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Effect of changing threshold on F1 score

In a binary classifier, what happens to the F1 score if you increase the decision threshold too high?

AF1 score increases because precision and recall both improve

BF1 score becomes undefined

CF1 score decreases because recall drops while precision may increase

DF1 score stays the same regardless of threshold

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Identify error in metric calculation code

What error does this code raise when calculating the mean squared error (MSE)?

Prompt Engineering / GenAI

import numpy as np
true = np.array([1, 2, 3])
pred = np.array([1, 2])
mse = np.mean((true - pred) ** 2)
print(mse)

ATypeError because numpy arrays cannot be subtracted

BValueError due to shape mismatch in subtraction

CZeroDivisionError when calculating mean

DNo error, outputs 0.5

Attempts:

2 left

Practice

(1/5)

1. Which automated evaluation metric is commonly used to measure the accuracy of classification models?

easy

A. Perplexity

B. Mean Squared Error

C. BLEU Score

D. Accuracy

Automated evaluation metrics in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand classification metrics

Step 2: Match metric to task

Final Answer:

Quick Check:

Solution

Step 1: Recall scikit-learn function name

Step 2: Check function call syntax

Final Answer:

Quick Check:

Solution

Step 1: Calculate precision and recall

Step 2: Compute F1 score

Step 3: Verify scikit-learn default behavior

Step 4: Check rounding

Final Answer:

Quick Check:

Solution

Step 1: Check imports and variables

Step 2: Understand precision_score behavior

Step 3: Analyze given data

Step 4: Consider label types

Final Answer:

Quick Check:

Solution

Step 1: Identify task type

Step 2: Match metric to task

Step 3: Exclude unrelated metrics

Final Answer:

Quick Check: