0
0
Prompt Engineering / GenAIml~10 mins

Why LLM evaluation ensures quality in Prompt Engineering / GenAI - Test Your Understanding

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to print the accuracy of the LLM predictions.

Prompt Engineering / GenAI
accuracy = sum(predictions == labels) / len(labels)
print('Accuracy:', [1])
Drag options to blanks, or click blank then click option'
Aaccuracy
Bpredictions
Clabels
Dlen(predictions)
Attempts:
3 left
💡 Hint
Common Mistakes
Printing predictions or labels instead of accuracy.
Using length of predictions instead of accuracy.
2fill in blank
medium

Complete the code to calculate the F1 score for LLM evaluation.

Prompt Engineering / GenAI
from sklearn.metrics import f1_score
f1 = f1_score(labels, [1])
print('F1 Score:', f1)
Drag options to blanks, or click blank then click option'
Alabels
Bscores
Caccuracy
Dpredictions
Attempts:
3 left
💡 Hint
Common Mistakes
Passing labels twice instead of predictions.
Passing accuracy or scores which are not label arrays.
3fill in blank
hard

Fix the error in the code to compute the confusion matrix for LLM outputs.

Prompt Engineering / GenAI
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(labels, [1])
print(cm)
Drag options to blanks, or click blank then click option'
Alabels
Bpredictions
Caccuracy
Dscores
Attempts:
3 left
💡 Hint
Common Mistakes
Passing accuracy or labels instead of predictions.
Confusing scores with predicted labels.
4fill in blank
hard

Fill both blanks to create a dictionary of word counts from LLM output tokens.

Prompt Engineering / GenAI
word_counts = {word: [1] for word in tokens if word [2] stop_words}
Drag options to blanks, or click blank then click option'
Atokens.count(word)
Bin
Cnot in
Dlen(word)
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'in' instead of 'not in' to filter stop words.
Using len(word) instead of count for frequency.
5fill in blank
hard

Fill all three blanks to filter LLM predictions with confidence above threshold and map to labels.

Prompt Engineering / GenAI
filtered = [1]: label for label, score in zip(labels, scores) if score [2] threshold and label [3] valid_labels}
Drag options to blanks, or click blank then click option'
Alabel
B>
Cin
Dscore
Attempts:
3 left
💡 Hint
Common Mistakes
Using label as key instead of score.
Using '<' instead of '>' for filtering.
Checking label not in valid_labels instead of in.