0
0
Prompt Engineering / GenAIml~20 mins

Image understanding and description in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Image Captioning Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
What is the primary role of an image captioning model?

Imagine you have a smart assistant that looks at pictures and tells you what it sees in simple sentences. What is the main job of this assistant?

ATo generate a short text description that explains the content of the image.
BTo classify the image into one of many categories without describing it.
CTo enhance the image quality by removing noise and improving colors.
DTo detect faces in the image and blur them for privacy.
Attempts:
2 left
💡 Hint

Think about what it means to 'describe' an image in words.

Predict Output
intermediate
1:30remaining
What is the output of this image captioning code snippet?

Given the following simplified code that uses a pre-trained image captioning model, what will be printed?

Prompt Engineering / GenAI
image = load_image('dog_park.jpg')
caption = model.generate_caption(image)
print(caption)
A"A group of dogs playing in a park."
BSyntaxError: missing parentheses in call to 'print'
C"dog_park.jpg"
DNone
Attempts:
2 left
💡 Hint

The model generates a text description of the image, not just the filename.

Model Choice
advanced
2:00remaining
Which model architecture is best suited for image captioning tasks?

You want to build a system that looks at images and writes sentences describing them. Which model type is most appropriate?

AA simple feedforward neural network with no sequence handling.
BA convolutional neural network (CNN) combined with a recurrent neural network (RNN).
CA support vector machine (SVM) classifier.
DA k-means clustering algorithm.
Attempts:
2 left
💡 Hint

Think about how images and sentences are processed differently and how to combine them.

Metrics
advanced
1:30remaining
Which metric is commonly used to evaluate image captioning quality?

After training an image captioning model, you want to measure how good its descriptions are compared to human-written captions. Which metric should you use?

AConfusion matrix of detected objects.
BMean Squared Error (MSE) between pixel values of images.
CBLEU score, which compares the overlap of words and phrases between generated and reference captions.
DAccuracy of classifying images into categories.
Attempts:
2 left
💡 Hint

Think about metrics that compare text similarity.

🔧 Debug
expert
2:30remaining
Why does this image captioning model produce repetitive captions?

Consider this simplified code snippet where the model generates captions but repeats the same word multiple times:

caption = model.generate_caption(image)
print(caption)
# Output: "dog dog dog dog dog"

What is the most likely cause?

AThe print statement is inside a loop printing the same word multiple times.
BThe input image is corrupted and cannot be processed.
CThe model was trained on only one image, so it memorizes that caption.
DThe model's beam search decoding is not implemented correctly, causing it to select the same word repeatedly.
Attempts:
2 left
💡 Hint

Think about how the model chooses words during caption generation.