Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to define a simple human evaluation metric function that returns the average score.
Prompt Engineering / GenAI
def average_score(scores): return sum(scores) / [1]
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Dividing by sum(scores) instead of the count.
Using max or min instead of the count.
✗ Incorrect
To find the average score, divide the sum of scores by the number of scores, which is given by len(scores).
2fill in blank
mediumComplete the code to calculate the inter-rater agreement using Cohen's kappa formula denominator.
Prompt Engineering / GenAI
def cohen_kappa_denominator(p0, pe): return [1] - pe
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using p0 instead of 1 in the denominator.
Subtracting p0 from pe instead of 1 from pe.
✗ Incorrect
The denominator in Cohen's kappa is 1 minus the expected agreement pe.
3fill in blank
hardFix the error in the code to compute the average human evaluation score from a dictionary of scores.
Prompt Engineering / GenAI
def average_human_score(scores_dict): total = sum(scores_dict.values()) count = len([1]) return total / count
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using len(scores_dict.values()) which is valid but redundant here.
Using len(scores_dict.items()) which also works but is less direct.
✗ Incorrect
len(scores_dict) gives the number of entries, which is the count of scores.
4fill in blank
hardFill both blanks to create a dictionary comprehension that filters human evaluation scores above 3.
Prompt Engineering / GenAI
filtered_scores = [1]: score for [2], score in scores.items() if score > 3}
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using the score as the key instead of the rater ID.
Mixing variable names inconsistently.
✗ Incorrect
The dictionary keys are rater_id and the loop variable for keys is rater.
5fill in blank
hardFill all three blanks to compute the weighted average human evaluation score.
Prompt Engineering / GenAI
def weighted_average(scores, weights): total_weighted = sum(scores[i] * [1] for i in range(len(scores))) total_weight = sum([2]) return total_weighted / [3]
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using scores[i] instead of weights[i] in multiplication.
Dividing by weights instead of total_weight.
✗ Incorrect
Multiply each score by its weight, sum weights, then divide total weighted sum by total weight.