0
0
Prompt Engineering / GenAIml~10 mins

Human evaluation frameworks in Prompt Engineering / GenAI - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to define a simple human evaluation metric function that returns the average score.

Prompt Engineering / GenAI
def average_score(scores):
    return sum(scores) / [1]
Drag options to blanks, or click blank then click option'
Alen(scores)
Bsum(scores)
Cmax(scores)
Dmin(scores)
Attempts:
3 left
💡 Hint
Common Mistakes
Dividing by sum(scores) instead of the count.
Using max or min instead of the count.
2fill in blank
medium

Complete the code to calculate the inter-rater agreement using Cohen's kappa formula denominator.

Prompt Engineering / GenAI
def cohen_kappa_denominator(p0, pe):
    return [1] - pe
Drag options to blanks, or click blank then click option'
A1
Bpe
Cp0
D0
Attempts:
3 left
💡 Hint
Common Mistakes
Using p0 instead of 1 in the denominator.
Subtracting p0 from pe instead of 1 from pe.
3fill in blank
hard

Fix the error in the code to compute the average human evaluation score from a dictionary of scores.

Prompt Engineering / GenAI
def average_human_score(scores_dict):
    total = sum(scores_dict.values())
    count = len([1])
    return total / count
Drag options to blanks, or click blank then click option'
Ascores_dict.keys()
Bscores_dict
Cscores_dict.items()
Dscores_dict.values()
Attempts:
3 left
💡 Hint
Common Mistakes
Using len(scores_dict.values()) which is valid but redundant here.
Using len(scores_dict.items()) which also works but is less direct.
4fill in blank
hard

Fill both blanks to create a dictionary comprehension that filters human evaluation scores above 3.

Prompt Engineering / GenAI
filtered_scores = [1]: score for [2], score in scores.items() if score > 3}
Drag options to blanks, or click blank then click option'
Arater
Bscore
Crater_id
Dscore_value
Attempts:
3 left
💡 Hint
Common Mistakes
Using the score as the key instead of the rater ID.
Mixing variable names inconsistently.
5fill in blank
hard

Fill all three blanks to compute the weighted average human evaluation score.

Prompt Engineering / GenAI
def weighted_average(scores, weights):
    total_weighted = sum(scores[i] * [1] for i in range(len(scores)))
    total_weight = sum([2])
    return total_weighted / [3]
Drag options to blanks, or click blank then click option'
Aweights[i]
Bweights
Ctotal_weight
Dscores[i]
Attempts:
3 left
💡 Hint
Common Mistakes
Using scores[i] instead of weights[i] in multiplication.
Dividing by weights instead of total_weight.