0
0
TensorFlowml~20 mins

Learning rate for fine-tuning in TensorFlow - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Fine-Tuning Learning Rate Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why use a smaller learning rate for fine-tuning?

When fine-tuning a pre-trained neural network, why is it common to use a smaller learning rate compared to training from scratch?

ABecause a smaller learning rate helps preserve the learned features and avoids large updates that could destroy useful information.
BBecause a smaller learning rate prevents the model from converging too quickly to a good solution.
CBecause a smaller learning rate speeds up training by making bigger jumps in the parameter space.
DBecause a smaller learning rate increases the chance of overfitting the training data.
Attempts:
2 left
💡 Hint

Think about what happens if you change the pre-trained weights too much.

Predict Output
intermediate
2:00remaining
Output of learning rate schedule during fine-tuning

Consider this TensorFlow code snippet that sets a learning rate schedule for fine-tuning:

import tensorflow as tf

initial_lr = 0.001
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=initial_lr,
    decay_steps=1000,
    decay_rate=0.96,
    staircase=True)

optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule)

lrs = [optimizer.learning_rate(step).numpy() for step in [0, 1000, 2000, 3000]]
print(lrs)

What is the output printed?

A[0.001, 0.00096, 0.00096, 0.00096]
B[0.001, 0.00096, 0.0009216, 0.0008847360000000001]
C[0.001, 0.00096, 0.0009216, 0.0009216]
D[0.001, 0.00096, 0.0009216, 0.000884736]
Attempts:
2 left
💡 Hint

Recall that with staircase=True, the learning rate decays in steps at multiples of decay_steps.

Hyperparameter
advanced
2:00remaining
Choosing learning rate for fine-tuning a large pre-trained model

You are fine-tuning a large pre-trained image classification model on a small dataset. Which learning rate choice is most appropriate to avoid overfitting and preserve learned features?

AUse a very high learning rate like 0.1 to quickly adapt to the new data.
BUse a moderate learning rate like 0.01 to balance adaptation and stability.
CUse no learning rate (0) to freeze all layers and not update weights.
DUse a low learning rate like 0.0001 to make small updates and avoid destroying pre-trained weights.
Attempts:
2 left
💡 Hint

Think about the size of the dataset and the risk of losing pre-trained knowledge.

Metrics
advanced
2:00remaining
Effect of learning rate on training and validation loss during fine-tuning

During fine-tuning, you observe the following behavior:

  • Training loss decreases steadily.
  • Validation loss starts increasing after a few epochs.

What does this suggest about the learning rate and model behavior?

AThe learning rate is too high, causing the model to overfit and validation loss to increase.
BThe learning rate is too low, causing underfitting and poor validation performance.
CThe learning rate is appropriate; the model is learning well on both training and validation data.
DThe learning rate is irrelevant; this behavior is caused by data imbalance.
Attempts:
2 left
💡 Hint

Think about what causes validation loss to increase while training loss decreases.

🔧 Debug
expert
2:00remaining
Identifying the cause of unstable training during fine-tuning

You fine-tune a pre-trained model with this optimizer setup:

optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

After a few batches, the training loss becomes NaN and the model stops learning. What is the most likely cause?

AThe learning rate is too low, causing the model to stop updating weights.
BThe optimizer Adam does not support learning rates above 0.001.
CThe learning rate is too high, causing gradient explosion and NaN loss values.
DThe batch size is too large, causing memory overflow.
Attempts:
2 left
💡 Hint

Consider what happens when the learning rate is set too high in gradient-based optimization.