0
0
TensorFlowml~20 mins

Optimizers (SGD, Adam, RMSprop) in TensorFlow - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Optimizer Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding the role of learning rate in optimizers

Which statement best describes the effect of a very high learning rate when using the Adam optimizer?

AThe model always converges but more slowly than with a low learning rate.
BThe model converges quickly to the best solution without overshooting.
CThe model ignores the learning rate and uses default values internally.
DThe model may fail to converge and the loss can oscillate or diverge.
Attempts:
2 left
💡 Hint

Think about what happens if you take too big steps when trying to find the lowest point on a hill.

Predict Output
intermediate
2:00remaining
Output of training loss with different optimizers

Given the following code snippet training a simple model on dummy data, what will be the printed loss value after one training step using RMSprop optimizer?

TensorFlow
import tensorflow as tf
import numpy as np

x = np.array([[1.0], [2.0], [3.0], [4.0]])
y = np.array([[2.0], [4.0], [6.0], [8.0]])

model = tf.keras.Sequential([tf.keras.layers.Dense(1)])
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.01)

loss_fn = tf.keras.losses.MeanSquaredError()

with tf.GradientTape() as tape:
    predictions = model(x, training=True)
    loss = loss_fn(y, predictions)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))

print(round(float(loss), 3))
A15.0
B20.0
C10.0
D5.0
Attempts:
2 left
💡 Hint

Initial weights are random, so loss will be relatively high but not extremely large.

Model Choice
advanced
2:00remaining
Choosing the best optimizer for sparse gradients

You are training a neural network with very sparse gradients (many zeros). Which optimizer is generally the best choice to handle sparse updates efficiently?

ARMSprop optimizer
BAdam optimizer
CStochastic Gradient Descent (SGD) without momentum
DBatch Gradient Descent
Attempts:
2 left
💡 Hint

Consider which optimizer adapts learning rates per parameter and handles sparse gradients well.

Hyperparameter
advanced
2:00remaining
Effect of momentum parameter in SGD

What is the effect of increasing the momentum parameter in SGD optimizer during training?

AIt helps accelerate training by smoothing updates and avoiding local minima.
BIt decreases the learning rate automatically over time.
CIt slows down training by reducing step size.
DIt causes the optimizer to ignore gradients and update randomly.
Attempts:
2 left
💡 Hint

Think about how momentum in physics helps keep an object moving smoothly.

🔧 Debug
expert
2:00remaining
Identifying the cause of exploding gradients with Adam optimizer

Consider this training loop using Adam optimizer. The loss suddenly becomes NaN after several epochs. What is the most likely cause?

model = tf.keras.Sequential([tf.keras.layers.Dense(10, activation='relu'), tf.keras.layers.Dense(1)])
optimizer = tf.keras.optimizers.Adam(learning_rate=1.0)

for epoch in range(10):
    with tf.GradientTape() as tape:
        predictions = model(x)
        loss = tf.reduce_mean(tf.square(y - predictions))
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    print(f"Epoch {epoch} Loss: {loss.numpy()}")
AThe learning rate is too high causing unstable updates and exploding gradients.
BThe model architecture is incorrect and causes NaN values.
CThe loss function is incompatible with Adam optimizer.
DThe input data x contains NaN values causing loss to become NaN.
Attempts:
2 left
💡 Hint

Check the learning rate value and its effect on training stability.