Hint: Use tf.optimizers.Adam with named learning_rate [OK]
Common Mistakes:
Using old tf.AdamOptimizer from TF1.x
Passing learning rate as positional argument
Using non-existent tf.optimizers.AdamOptimizer
3. What will be the output loss value after one training step using RMSprop optimizer with learning rate 0.01 on a simple linear model trained on data x=[1,2], y=[2,4]? Assume initial weights are zero and mean squared error loss.
medium
A. 0.5
B. 9.5
C. 1.0
D. 4.0
Solution
Step 1: Calculate initial prediction and loss
Initial weights zero means prediction is 0 for inputs. Loss = mean squared error = mean([4,16]) = 10.
Step 2: Perform one RMSprop update step
RMSprop scales update by rms of gradient (first step rms ≈ 0.32*|g|). Gradients ≈[-10,-6] for [w,b], updates ≈[+0.032,+0.032]. New preds ≈[0.063,0.095], new loss ≈9.5.
Final Answer:
9.5 -> Option B
Quick Check:
Loss after step ≈ 9.5 [OK]
Hint: RMSprop first step small due to scaling, loss ~9.5 [OK]
Common Mistakes:
Expecting sharp loss drop after one step
Confusing learning rate effect
Ignoring initial zero weights impact
4. You wrote this code to use Adam optimizer but get an error:
A. Model.compile does not accept optimizer objects
B. Adam optimizer does not accept float arguments
C. Loss function 'mse' is invalid
D. Learning rate must be named as learning_rate=0.01
Solution
Step 1: Check Adam optimizer argument requirements
TF2.x Adam expects keyword 'learning_rate=', not TF1.x-style 'lr='.
Step 2: Identify error cause in code
Using lr=0.01 causes TypeError (unexpected keyword). Correct: tf.optimizers.Adam(learning_rate=0.01).
Final Answer:
Learning rate must be named as learning_rate=0.01 -> Option D
Quick Check:
Named argument needed [OK]
Hint: Always name learning_rate in Adam optimizer [OK]
Common Mistakes:
Using 'lr=0.01' keyword from TF1.x
Assuming 'mse' is invalid loss
Thinking optimizer object can't be passed
5. You want to train a model on noisy data that changes over time. Which optimizer is best suited to adapt learning rates per parameter and handle this noise effectively?
hard
A. Adam
B. Gradient Descent with fixed learning rate
C. RMSprop
D. SGD without momentum
Solution
Step 1: Understand optimizer strengths for noisy data
Adam adapts learning rates per parameter and combines momentum and RMSprop ideas, handling noise well.
Step 2: Compare with other optimizers
SGD without momentum and fixed learning rate struggle with noise. RMSprop adapts rates but Adam adds momentum for better stability.
Final Answer:
Adam -> Option A
Quick Check:
Best for noisy data = Adam [OK]
Hint: Adam adapts learning rates and handles noise best [OK]