What is the main purpose of the BLEU score in evaluating machine learning models?
Think about how text generation quality is measured by comparing outputs.
BLEU score compares the overlap of small sequences of words (n-grams) between the generated text and reference text to estimate quality.
What is the output of the following Python code that calculates accuracy?
true_labels = [1, 0, 1, 1, 0] predictions = [1, 0, 0, 1, 0] correct = sum(t == p for t, p in zip(true_labels, predictions)) accuracy = correct / len(true_labels) print(round(accuracy, 2))
Count how many predictions match the true labels, then divide by total.
There are 4 matches out of 5, so accuracy is 4/5 = 0.8.
You have a dataset where one class is very rare. Which metric is best to evaluate your model's performance?
Think about metrics that focus on positive class detection in imbalanced data.
Precision and recall help measure how well the model detects rare classes, unlike accuracy which can be misleading.
In a binary classifier, what happens to the F1 score if you increase the decision threshold too high?
Think about how raising threshold affects false negatives and false positives.
Increasing threshold usually reduces recall (more false negatives) and may increase precision, lowering F1 score overall.
What error does this code raise when calculating the mean squared error (MSE)?
import numpy as np true = np.array([1, 2, 3]) pred = np.array([1, 2]) mse = np.mean((true - pred) ** 2) print(mse)
Check if the arrays have the same length before subtracting.
Subtracting arrays of different shapes causes a ValueError due to shape mismatch.