Bird
Raised Fist0
Prompt Engineering / GenAIml~10 mins

Image understanding and description in Prompt Engineering / GenAI - Interactive Code Practice

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to load an image using PIL.

Prompt Engineering / GenAI
from PIL import Image
img = Image.[1]('example.jpg')
Drag options to blanks, or click blank then click option'
Aopen
Bimport
Cread
Dload
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'load' or 'read' instead of 'open' causes errors.
Trying to use 'import' as a method.
2fill in blank
medium

Complete the code to convert an image to a tensor for model input.

Prompt Engineering / GenAI
import torchvision.transforms as transforms
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
tensor_img = transform([1])
Drag options to blanks, or click blank then click option'
Ainput_data
Bimage_path
Cimage_tensor
Dimg
Attempts:
3 left
💡 Hint
Common Mistakes
Passing a file path string instead of an image object.
Passing an already tensor variable.
3fill in blank
hard

Fix the error in the code to generate image captions using a pretrained model.

Prompt Engineering / GenAI
from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer
model = VisionEncoderDecoderModel.from_pretrained('nlpconnect/vit-gpt2-image-captioning')
processor = ViTImageProcessor.from_pretrained('nlpconnect/vit-gpt2-image-captioning')
tokenizer = AutoTokenizer.from_pretrained('nlpconnect/vit-gpt2-image-captioning')

pixel_values = processor(images=img, return_tensors='pt').[1]()
output_ids = model.generate(pixel_values)
caption = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(caption)
Drag options to blanks, or click blank then click option'
Asqueeze
Bunsqueeze
Cflatten
Dreshape
Attempts:
3 left
💡 Hint
Common Mistakes
Using squeeze removes dimensions and causes errors.
Using flatten or reshape changes tensor shape incorrectly.
4fill in blank
hard

Fill both blanks to create a dictionary of image features and their lengths.

Prompt Engineering / GenAI
features = {img_id: [1] for img_id, img in images.items() if len([2]) > 0}
Drag options to blanks, or click blank then click option'
Aprocessor(images=img, return_tensors='pt').pixel_values
Bimg
Cimg_id
Dimages
Attempts:
3 left
💡 Hint
Common Mistakes
Using img_id or images instead of the image object.
Not using the processor to get pixel values.
5fill in blank
hard

Fill all three blanks to filter captions longer than 5 words and create a summary dictionary.

Prompt Engineering / GenAI
summary = {img_id: caption for img_id, caption in captions.items() if len(caption.[1]()) > [2] and caption.[3](' ') > 0}
Drag options to blanks, or click blank then click option'
Asplit
B5
Ccount
Dstrip
Attempts:
3 left
💡 Hint
Common Mistakes
Using strip instead of split for word count.
Comparing length to wrong number.
Using count with wrong character.

Practice

(1/5)
1.

What does image understanding mean in AI?

easy
A. Drawing a new picture from scratch
B. Writing a story about a picture
C. Changing the colors of a picture
D. Recognizing objects and details in a picture

Solution

  1. Step 1: Understand the term 'image understanding'

    Image understanding means the AI looks at a picture and finds what objects or details are inside it.
  2. Step 2: Compare options with the meaning

    Only Recognizing objects and details in a picture matches this meaning exactly, others talk about writing, coloring, or drawing which are different tasks.
  3. Final Answer:

    Recognizing objects and details in a picture -> Option D
  4. Quick Check:

    Image understanding = Recognizing objects [OK]
Hint: Image understanding means spotting things in a picture [OK]
Common Mistakes:
  • Confusing image understanding with image editing
  • Thinking it means writing about the image
  • Mixing it with creating new images
2.

Which of the following is the correct way to describe an image using AI?

"A cat sitting on a mat."
easy
A. A sentence describing what is in the image
B. A code to change image colors
C. A list of numbers representing pixels
D. A command to delete the image

Solution

  1. Step 1: Understand image description

    Image description means writing a sentence that tells what is seen in the picture.
  2. Step 2: Match options to this meaning

    A sentence describing what is in the image is a sentence describing the image, while others are about pixels, color changes, or deleting, which are unrelated.
  3. Final Answer:

    A sentence describing what is in the image -> Option A
  4. Quick Check:

    Image description = Sentence about image [OK]
Hint: Image description is a sentence about the picture [OK]
Common Mistakes:
  • Confusing description with pixel data
  • Thinking description changes the image
  • Mixing description with image deletion
3.

Given this Python code snippet using a simple AI model for image description, what will be the output?

def describe_image(image):
    if 'dog' in image:
        return 'A dog playing in the park.'
    else:
        return 'Unknown image.'

result = describe_image('photo of a dog')
print(result)
medium
A. A dog playing in the park.
B. Unknown image.
C. photo of a dog
D. Error: 'dog' not found

Solution

  1. Step 1: Check the input string for keyword

    The input string is 'photo of a dog', which contains the word 'dog'.
  2. Step 2: Follow the if condition in the function

    Since 'dog' is found, the function returns 'A dog playing in the park.'
  3. Final Answer:

    A dog playing in the park. -> Option A
  4. Quick Check:

    Keyword 'dog' found = Correct description [OK]
Hint: Check if 'dog' is in the input string [OK]
Common Mistakes:
  • Ignoring the if condition and choosing 'Unknown image.'
  • Confusing input string with output
  • Expecting an error when none occurs
4.

Find the error in this AI image description function and choose the fix:

def describe(image):
    if image.contains('cat'):
        return 'A cat on the sofa.'
    else:
        return 'No cat found.'
medium
A. Change return to print
B. Add a semicolon at the end of each line
C. Replace image.contains('cat') with 'cat' in image
D. Use image.has('cat') instead

Solution

  1. Step 1: Identify the error in method usage

    Strings in Python do not have a contains() method; membership is checked with in.
  2. Step 2: Choose the correct syntax for membership check

    Replacing image.contains('cat') with 'cat' in image fixes the error.
  3. Final Answer:

    Replace image.contains('cat') with 'cat' in image -> Option C
  4. Quick Check:

    Use 'in' for string membership in Python [OK]
Hint: Use 'in' to check if substring is in string [OK]
Common Mistakes:
  • Using non-existent string methods like contains()
  • Thinking print replaces return
  • Adding unnecessary semicolons
5.

You want to build an AI that looks at a photo and writes a short sentence describing it. Which approach is best?

hard
A. Manually write descriptions for every photo
B. Train a model to recognize objects and generate sentences about them
C. Use a model that only changes photo colors
D. Train a model to delete photos with no objects

Solution

  1. Step 1: Understand the goal of automatic image description

    The AI should identify objects in the photo and then create a sentence describing what it sees.
  2. Step 2: Evaluate the options for this goal

    Train a model to recognize objects and generate sentences about them describes training a model to do both recognition and sentence generation, which fits the goal best. Others are manual, unrelated, or destructive.
  3. Final Answer:

    Train a model to recognize objects and generate sentences about them -> Option B
  4. Quick Check:

    Recognition + sentence generation = Best approach [OK]
Hint: Combine object recognition with sentence generation [OK]
Common Mistakes:
  • Choosing manual description which is slow
  • Confusing color changes with description
  • Thinking deleting photos helps description