Bird
Raised Fist0
Prompt Engineering / GenAIml~20 mins

Image understanding and description in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Image Captioning Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
What is the primary role of an image captioning model?

Imagine you have a smart assistant that looks at pictures and tells you what it sees in simple sentences. What is the main job of this assistant?

ATo generate a short text description that explains the content of the image.
BTo classify the image into one of many categories without describing it.
CTo enhance the image quality by removing noise and improving colors.
DTo detect faces in the image and blur them for privacy.
Attempts:
2 left
💡 Hint

Think about what it means to 'describe' an image in words.

Predict Output
intermediate
1:30remaining
What is the output of this image captioning code snippet?

Given the following simplified code that uses a pre-trained image captioning model, what will be printed?

Prompt Engineering / GenAI
image = load_image('dog_park.jpg')
caption = model.generate_caption(image)
print(caption)
A"A group of dogs playing in a park."
BSyntaxError: missing parentheses in call to 'print'
C"dog_park.jpg"
DNone
Attempts:
2 left
💡 Hint

The model generates a text description of the image, not just the filename.

Model Choice
advanced
2:00remaining
Which model architecture is best suited for image captioning tasks?

You want to build a system that looks at images and writes sentences describing them. Which model type is most appropriate?

AA simple feedforward neural network with no sequence handling.
BA convolutional neural network (CNN) combined with a recurrent neural network (RNN).
CA support vector machine (SVM) classifier.
DA k-means clustering algorithm.
Attempts:
2 left
💡 Hint

Think about how images and sentences are processed differently and how to combine them.

Metrics
advanced
1:30remaining
Which metric is commonly used to evaluate image captioning quality?

After training an image captioning model, you want to measure how good its descriptions are compared to human-written captions. Which metric should you use?

AConfusion matrix of detected objects.
BMean Squared Error (MSE) between pixel values of images.
CBLEU score, which compares the overlap of words and phrases between generated and reference captions.
DAccuracy of classifying images into categories.
Attempts:
2 left
💡 Hint

Think about metrics that compare text similarity.

🔧 Debug
expert
2:30remaining
Why does this image captioning model produce repetitive captions?

Consider this simplified code snippet where the model generates captions but repeats the same word multiple times:

caption = model.generate_caption(image)
print(caption)
# Output: "dog dog dog dog dog"

What is the most likely cause?

AThe print statement is inside a loop printing the same word multiple times.
BThe input image is corrupted and cannot be processed.
CThe model was trained on only one image, so it memorizes that caption.
DThe model's beam search decoding is not implemented correctly, causing it to select the same word repeatedly.
Attempts:
2 left
💡 Hint

Think about how the model chooses words during caption generation.

Practice

(1/5)
1.

What does image understanding mean in AI?

easy
A. Drawing a new picture from scratch
B. Writing a story about a picture
C. Changing the colors of a picture
D. Recognizing objects and details in a picture

Solution

  1. Step 1: Understand the term 'image understanding'

    Image understanding means the AI looks at a picture and finds what objects or details are inside it.
  2. Step 2: Compare options with the meaning

    Only Recognizing objects and details in a picture matches this meaning exactly, others talk about writing, coloring, or drawing which are different tasks.
  3. Final Answer:

    Recognizing objects and details in a picture -> Option D
  4. Quick Check:

    Image understanding = Recognizing objects [OK]
Hint: Image understanding means spotting things in a picture [OK]
Common Mistakes:
  • Confusing image understanding with image editing
  • Thinking it means writing about the image
  • Mixing it with creating new images
2.

Which of the following is the correct way to describe an image using AI?

"A cat sitting on a mat."
easy
A. A sentence describing what is in the image
B. A code to change image colors
C. A list of numbers representing pixels
D. A command to delete the image

Solution

  1. Step 1: Understand image description

    Image description means writing a sentence that tells what is seen in the picture.
  2. Step 2: Match options to this meaning

    A sentence describing what is in the image is a sentence describing the image, while others are about pixels, color changes, or deleting, which are unrelated.
  3. Final Answer:

    A sentence describing what is in the image -> Option A
  4. Quick Check:

    Image description = Sentence about image [OK]
Hint: Image description is a sentence about the picture [OK]
Common Mistakes:
  • Confusing description with pixel data
  • Thinking description changes the image
  • Mixing description with image deletion
3.

Given this Python code snippet using a simple AI model for image description, what will be the output?

def describe_image(image):
    if 'dog' in image:
        return 'A dog playing in the park.'
    else:
        return 'Unknown image.'

result = describe_image('photo of a dog')
print(result)
medium
A. A dog playing in the park.
B. Unknown image.
C. photo of a dog
D. Error: 'dog' not found

Solution

  1. Step 1: Check the input string for keyword

    The input string is 'photo of a dog', which contains the word 'dog'.
  2. Step 2: Follow the if condition in the function

    Since 'dog' is found, the function returns 'A dog playing in the park.'
  3. Final Answer:

    A dog playing in the park. -> Option A
  4. Quick Check:

    Keyword 'dog' found = Correct description [OK]
Hint: Check if 'dog' is in the input string [OK]
Common Mistakes:
  • Ignoring the if condition and choosing 'Unknown image.'
  • Confusing input string with output
  • Expecting an error when none occurs
4.

Find the error in this AI image description function and choose the fix:

def describe(image):
    if image.contains('cat'):
        return 'A cat on the sofa.'
    else:
        return 'No cat found.'
medium
A. Change return to print
B. Add a semicolon at the end of each line
C. Replace image.contains('cat') with 'cat' in image
D. Use image.has('cat') instead

Solution

  1. Step 1: Identify the error in method usage

    Strings in Python do not have a contains() method; membership is checked with in.
  2. Step 2: Choose the correct syntax for membership check

    Replacing image.contains('cat') with 'cat' in image fixes the error.
  3. Final Answer:

    Replace image.contains('cat') with 'cat' in image -> Option C
  4. Quick Check:

    Use 'in' for string membership in Python [OK]
Hint: Use 'in' to check if substring is in string [OK]
Common Mistakes:
  • Using non-existent string methods like contains()
  • Thinking print replaces return
  • Adding unnecessary semicolons
5.

You want to build an AI that looks at a photo and writes a short sentence describing it. Which approach is best?

hard
A. Manually write descriptions for every photo
B. Train a model to recognize objects and generate sentences about them
C. Use a model that only changes photo colors
D. Train a model to delete photos with no objects

Solution

  1. Step 1: Understand the goal of automatic image description

    The AI should identify objects in the photo and then create a sentence describing what it sees.
  2. Step 2: Evaluate the options for this goal

    Train a model to recognize objects and generate sentences about them describes training a model to do both recognition and sentence generation, which fits the goal best. Others are manual, unrelated, or destructive.
  3. Final Answer:

    Train a model to recognize objects and generate sentences about them -> Option B
  4. Quick Check:

    Recognition + sentence generation = Best approach [OK]
Hint: Combine object recognition with sentence generation [OK]
Common Mistakes:
  • Choosing manual description which is slow
  • Confusing color changes with description
  • Thinking deleting photos helps description