Image understanding and description in Prompt Engineering / GenAI - Full Explanation
Start learning this pattern below
Jump into concepts and practice - no test required
Imagine you are telling a friend about a photo you took on a trip. First, you notice the main things in the picture, like a mountain or a river. Then, you remember small details like the bright colors or the people smiling. Next, you think about how these parts fit together, like the sun shining over the lake. Finally, you tell your friend a clear story about the photo.
┌───────────────────────┐
│ Input: Image │
└──────────┬────────────┘
│
▼
┌───────────────────────┐
│ Image Recognition │
└──────────┬────────────┘
│
▼
┌───────────────────────┐
│ Feature Extraction │
└──────────┬────────────┘
│
▼
┌───────────────────────┐
│ Context Understanding │
└──────────┬────────────┘
│
▼
┌───────────────────────┐
│ Generating Description │
└──────────┬────────────┘
│
▼
┌──────────────────────────┐
│ Output: Text Description │
└──────────────────────────┘Practice
What does image understanding mean in AI?
Solution
Step 1: Understand the term 'image understanding'
Image understanding means the AI looks at a picture and finds what objects or details are inside it.Step 2: Compare options with the meaning
Only Recognizing objects and details in a picture matches this meaning exactly, others talk about writing, coloring, or drawing which are different tasks.Final Answer:
Recognizing objects and details in a picture -> Option DQuick Check:
Image understanding = Recognizing objects [OK]
- Confusing image understanding with image editing
- Thinking it means writing about the image
- Mixing it with creating new images
Which of the following is the correct way to describe an image using AI?
"A cat sitting on a mat."Solution
Step 1: Understand image description
Image description means writing a sentence that tells what is seen in the picture.Step 2: Match options to this meaning
A sentence describing what is in the image is a sentence describing the image, while others are about pixels, color changes, or deleting, which are unrelated.Final Answer:
A sentence describing what is in the image -> Option AQuick Check:
Image description = Sentence about image [OK]
- Confusing description with pixel data
- Thinking description changes the image
- Mixing description with image deletion
Given this Python code snippet using a simple AI model for image description, what will be the output?
def describe_image(image):
if 'dog' in image:
return 'A dog playing in the park.'
else:
return 'Unknown image.'
result = describe_image('photo of a dog')
print(result)Solution
Step 1: Check the input string for keyword
The input string is 'photo of a dog', which contains the word 'dog'.Step 2: Follow the if condition in the function
Since 'dog' is found, the function returns 'A dog playing in the park.'Final Answer:
A dog playing in the park. -> Option AQuick Check:
Keyword 'dog' found = Correct description [OK]
- Ignoring the if condition and choosing 'Unknown image.'
- Confusing input string with output
- Expecting an error when none occurs
Find the error in this AI image description function and choose the fix:
def describe(image):
if image.contains('cat'):
return 'A cat on the sofa.'
else:
return 'No cat found.'Solution
Step 1: Identify the error in method usage
Strings in Python do not have acontains()method; membership is checked within.Step 2: Choose the correct syntax for membership check
Replacingimage.contains('cat')with'cat' in imagefixes the error.Final Answer:
Replace image.contains('cat') with 'cat' in image -> Option CQuick Check:
Use 'in' for string membership in Python [OK]
- Using non-existent string methods like contains()
- Thinking print replaces return
- Adding unnecessary semicolons
You want to build an AI that looks at a photo and writes a short sentence describing it. Which approach is best?
Solution
Step 1: Understand the goal of automatic image description
The AI should identify objects in the photo and then create a sentence describing what it sees.Step 2: Evaluate the options for this goal
Train a model to recognize objects and generate sentences about them describes training a model to do both recognition and sentence generation, which fits the goal best. Others are manual, unrelated, or destructive.Final Answer:
Train a model to recognize objects and generate sentences about them -> Option BQuick Check:
Recognition + sentence generation = Best approach [OK]
- Choosing manual description which is slow
- Confusing color changes with description
- Thinking deleting photos helps description
