Think about how an agent first learns about its surroundings.
The 'Observe' step is when the agent gathers information from its environment using sensors or inputs. This data is essential before the agent can think or act.
environment_data = {'temperature': 30, 'light': 'bright'}
# Agent thinks based on temperature
if environment_data['temperature'] > 25:
action = 'turn_on_cooler'
else:
action = 'do_nothing'
print(action)Check the temperature value and the condition in the if statement.
The temperature is 30, which is greater than 25, so the agent decides to 'turn_on_cooler'.
Consider models that learn to make decisions based on rewards.
Reinforcement learning policy networks are designed to select actions in environments to maximize rewards, making them ideal for the 'Act' step.
Think about the parameter controlling randomness in action choice.
The exploration rate (epsilon) controls how often the agent tries new actions versus exploiting known ones, directly measuring this balance.
environment_data = {'temperature': 30}
# Agent thinks and decides action
if environment_data['temperature'] < 25:
action = 'turn_on_cooler'
else:
action = 'do_nothing'
print(action)Check the logic condition comparing temperature to 25.
The condition uses '< 25' which means the cooler turns on only if temperature is less than 25. Since temperature is 30, it chooses 'do_nothing'. Reversing the condition fixes this.