Complete the code to print the accuracy of the LLM predictions.
accuracy = sum(predictions == labels) / len(labels) print('Accuracy:', [1])
The accuracy variable holds the ratio of correct predictions to total labels, so printing accuracy shows the model's quality.
Complete the code to calculate the F1 score for LLM evaluation.
from sklearn.metrics import f1_score f1 = f1_score(labels, [1]) print('F1 Score:', f1)
The f1_score function compares true labels with predicted labels to compute the F1 score, so predictions is the correct input.
Fix the error in the code to compute the confusion matrix for LLM outputs.
from sklearn.metrics import confusion_matrix cm = confusion_matrix(labels, [1]) print(cm)
The confusion matrix compares true labels with predicted labels, so predictions must be passed as the second argument.
Fill both blanks to create a dictionary of word counts from LLM output tokens.
word_counts = {word: [1] for word in tokens if word [2] stop_words}We count how many times each word appears using tokens.count(word). We only include words that are not in the stop words list to focus on meaningful words.
Fill all three blanks to filter LLM predictions with confidence above threshold and map to labels.
filtered = [1]: label for label, score in zip(labels, scores) if score [2] threshold and label [3] valid_labels}
We use score as the key to map labels. We filter scores greater than the threshold using >. We include only labels that are in the valid labels list.