Prompt Engineering / GenAIml~20 mins

Image understanding and description in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Image understanding and description

Problem:We want to build a model that looks at an image and writes a short sentence describing what it sees. Currently, the model is very good at describing training images but makes many mistakes on new images it has never seen.

Current Metrics:Training accuracy: 95%, Validation accuracy: 65%, Validation loss: 1.2

Issue:The model is overfitting. It performs very well on training images but poorly on validation images, showing it does not generalize well.

Your Task

Reduce overfitting so that validation accuracy improves to at least 80% while keeping training accuracy below 90%.

You cannot change the dataset or add more data.

You must keep the same model architecture type (CNN + RNN for image captioning).

Hint 1

Hint 2

Hint 3

Solution

Prompt Engineering / GenAI

import tensorflow as tf
from tensorflow.keras.applications import InceptionV3
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Embedding, LSTM, Dropout, Add
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.callbacks import EarlyStopping

# Load pre-trained CNN for image feature extraction
base_model = InceptionV3(weights='imagenet')
cnn_model = Model(base_model.input, base_model.layers[-2].output)

# Freeze CNN layers
for layer in cnn_model.layers:
    layer.trainable = False

# Define inputs
image_input = Input(shape=(299, 299, 3))
image_features = cnn_model(image_input)
image_features = Dropout(0.5)(image_features)  # Added dropout
image_features = Dense(256)(image_features)  # Project to match LSTM output dim

# Text input for captions
caption_input = Input(shape=(max_caption_length,))
caption_embedding = Embedding(input_dim=vocab_size, output_dim=256, mask_zero=True)(caption_input)
caption_lstm = LSTM(256)(caption_embedding)
caption_lstm = Dropout(0.5)(caption_lstm)  # Added dropout

# Combine image and caption features
decoder = Add()([image_features, caption_lstm])
outputs = Dense(vocab_size, activation='softmax')(decoder)

# Define model
model = Model(inputs=[image_input, caption_input], outputs=outputs)

# Compile model with lower learning rate
optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

# Early stopping callback
early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Train model
model.fit(
    [train_images, train_captions], train_targets,
    epochs=20,
    batch_size=64,
    validation_data=([val_images, val_captions], val_targets),
    callbacks=[early_stop]
)

Added dropout layers after image feature extraction and LSTM layers to reduce overfitting.

Lowered the learning rate from 0.001 to 0.0001 for smoother training.

Added early stopping to stop training when validation loss stops improving.

Results Interpretation

Before: Training accuracy was 95%, validation accuracy was 65%, showing overfitting.

After: Training accuracy dropped to 88%, validation accuracy improved to 82%, and validation loss decreased, indicating better generalization.

Adding dropout and early stopping helps the model avoid memorizing training data and improves its ability to describe new images accurately.

Bonus Experiment

Try using data augmentation on the images to artificially increase dataset variety and see if validation accuracy improves further.

💡 Hint

Use simple image transformations like rotation, flipping, or zooming during training to help the model learn more robust features.

Practice

(1/5)

What does image understanding mean in AI?

easy

A. Drawing a new picture from scratch

B. Writing a story about a picture

C. Changing the colors of a picture

D. Recognizing objects and details in a picture

Which of the following is the correct way to describe an image using AI?

"A cat sitting on a mat."

easy

A. A sentence describing what is in the image

B. A code to change image colors

C. A list of numbers representing pixels

D. A command to delete the image

Given this Python code snippet using a simple AI model for image description, what will be the output?

def describe_image(image):
    if 'dog' in image:
        return 'A dog playing in the park.'
    else:
        return 'Unknown image.'

result = describe_image('photo of a dog')
print(result)

medium

A. A dog playing in the park.

B. Unknown image.

C. photo of a dog

D. Error: 'dog' not found

Find the error in this AI image description function and choose the fix:

def describe(image):
    if image.contains('cat'):
        return 'A cat on the sofa.'
    else:
        return 'No cat found.'

medium

A. Change return to print

B. Add a semicolon at the end of each line

C. Replace image.contains('cat') with 'cat' in image

D. Use image.has('cat') instead

Image understanding and description in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the term 'image understanding'

Step 2: Compare options with the meaning

Final Answer:

Quick Check:

Solution

Step 1: Understand image description

Step 2: Match options to this meaning

Final Answer:

Quick Check:

Solution

Step 1: Check the input string for keyword

Step 2: Follow the if condition in the function

Final Answer:

Quick Check:

Solution

Step 1: Identify the error in method usage

Step 2: Choose the correct syntax for membership check

Final Answer:

Quick Check:

Solution

Step 1: Understand the goal of automatic image description

Step 2: Evaluate the options for this goal

Final Answer:

Quick Check: