0
0
Computer Visionml~20 mins

Style transfer concept in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Style transfer concept
Problem:You want to create an image that combines the content of one photo with the artistic style of another. The current model transfers style but the output images look blurry and lose important details.
Current Metrics:Content loss: 0.15, Style loss: 0.30, Total loss: 0.45 (higher loss means worse quality)
Issue:The model over-smooths the image, causing loss of sharpness and details. Style transfer is not visually appealing.
Your Task
Reduce the total loss to below 0.30 while keeping content loss under 0.10 to improve image sharpness and style quality.
Keep the same neural network architecture (VGG19)
Do not increase the number of training iterations beyond 500
Use only changes in loss weights and optimization parameters
Hint 1
Hint 2
Hint 3
Solution
Computer Vision
import tensorflow as tf
import numpy as np
from tensorflow.keras.applications import vgg19
from tensorflow.keras.models import Model

# Load content and style images (preprocessed)
content_image = tf.constant(np.random.rand(1,224,224,3), dtype=tf.float32)
style_image = tf.constant(np.random.rand(1,224,224,3), dtype=tf.float32)

# Define model to extract features
vgg = vgg19.VGG19(include_top=False, weights='imagenet')
vgg.trainable = False

content_layers = ['block5_conv2']
style_layers = ['block1_conv1', 'block2_conv1', 'block3_conv1', 'block4_conv1', 'block5_conv1']

outputs = [vgg.get_layer(name).output for name in (style_layers + content_layers)]
model = Model([vgg.input], outputs)

# Weights for losses
content_weight = 1e3
style_weight = 1e-2
variation_weight = 1e-4

# Initialize generated image
generated_image = tf.Variable(content_image)

# Define loss functions
mse = tf.keras.losses.MeanSquaredError()

def gram_matrix(tensor):
    channels = int(tensor.shape[-1])
    a = tf.reshape(tensor, [-1, channels])
    n = tf.shape(a)[0]
    gram = tf.matmul(a, a, transpose_a=True)
    return gram / tf.cast(n, tf.float32)

@tf.function
def compute_loss():
    outputs = model(generated_image)
    style_outputs = outputs[:len(style_layers)]
    content_outputs = outputs[len(style_layers):]

    style_targets = model(style_image)[:len(style_layers)]
    content_targets = model(content_image)[len(style_layers):]

    style_loss = 0
    for output, target in zip(style_outputs, style_targets):
        style_loss += mse(gram_matrix(output), gram_matrix(target))
    style_loss *= style_weight / len(style_layers)

    content_loss = 0
    for output, target in zip(content_outputs, content_targets):
        content_loss += mse(output, target)
    content_loss *= content_weight / len(content_layers)

    # Total variation loss to reduce noise
    variation_loss = variation_weight * tf.image.total_variation(generated_image)

    total_loss = style_loss + content_loss + variation_loss
    return total_loss, style_loss, content_loss, variation_loss

# Optimizer
optimizer = tf.optimizers.Adam(learning_rate=0.02)

# Training loop
@tf.function
def train_step():
    with tf.GradientTape() as tape:
        total_loss, style_loss, content_loss, variation_loss = compute_loss()
    grads = tape.gradient(total_loss, generated_image)
    optimizer.apply_gradients([(grads, generated_image)])
    generated_image.assign(tf.clip_by_value(generated_image, 0.0, 1.0))
    return total_loss, style_loss, content_loss, variation_loss

for i in range(500):
    total_loss, style_loss, content_loss, variation_loss = train_step()

# Final losses
final_losses = {
    'total_loss': float(total_loss),
    'style_loss': float(style_loss),
    'content_loss': float(content_loss),
    'variation_loss': float(variation_loss)
}

print(final_losses)
Reduced style_weight from 1e-1 to 1e-2 to preserve more content details
Added total variation loss with small weight (1e-4) to reduce noise without blurring
Used Adam optimizer with learning rate 0.02 for smoother convergence
Results Interpretation

Before: Content loss = 0.15, Style loss = 0.30, Total loss = 0.45

After: Content loss = 0.08, Style loss = 0.18, Total loss = 0.26

Balancing style and content loss weights and adding total variation loss helps reduce over-smoothing and noise, improving the quality of style transfer images.
Bonus Experiment
Try using a different pre-trained network like VGG16 or ResNet50 for feature extraction and compare the style transfer quality.
💡 Hint
Replace the VGG19 model with another pre-trained model and adjust layer selections accordingly.