Bird
0
0

Which of the following pseudocode snippets correctly updates a self-improving agent's policy based on a learning rate and observed gradient?

easy📝 Syntax Q3 of 15
Agentic AI - Future of AI Agents
Which of the following pseudocode snippets correctly updates a self-improving agent's policy based on a learning rate and observed gradient?
Apolicy = policy - learning_rate / gradient
Bpolicy = policy + learning_rate * gradient
Cpolicy = learning_rate + policy * gradient
Dpolicy = gradient - learning_rate * policy
Step-by-Step Solution
Solution:
  1. Step 1: Understand policy update

    The policy is updated by moving in the direction of the gradient scaled by the learning rate.
  2. Step 2: Analyze options

    policy = policy + learning_rate * gradient correctly adds the product of learning rate and gradient to the current policy.
  3. Final Answer:

    policy = policy + learning_rate * gradient -> Option B
  4. Quick Check:

    Update rule matches standard gradient ascent [OK]
Quick Trick: Policy update adds learning_rate times gradient [OK]
Common Mistakes:
  • Subtracting instead of adding the gradient
  • Dividing learning rate by gradient
  • Multiplying policy incorrectly

Want More Practice?

15+ quiz questions · All difficulty levels · Free

Free Signup - Practice All Questions
More Agentic AI Quizzes