What is Reinforcement Learning: Simple Explanation and Example
Reinforcement learning is a type of machine learning where an agent learns to make decisions by trying actions and receiving rewards or penalties. It learns the best actions to take in a situation to maximize its total reward over time.How It Works
Imagine teaching a dog new tricks. You give it a treat when it does something right and ignore or gently correct it when it does something wrong. Over time, the dog learns which actions get rewards and tries to do those more often. Reinforcement learning works the same way but with computer programs called agents.
The agent interacts with an environment by taking actions. After each action, it gets feedback in the form of a reward (good) or penalty (bad). The agent's goal is to learn a strategy, called a policy, that tells it the best action to take in each situation to get the most reward in the long run.
This trial-and-error learning helps the agent improve by exploring different actions and remembering which ones worked best before.
Example
This example shows a simple agent learning to choose between two actions to get the highest reward.
import random class SimpleAgent: def __init__(self): self.values = {"A": 0, "B": 0} # Estimated value of actions self.counts = {"A": 0, "B": 0} # How many times each action was taken def choose_action(self): # Choose action with highest estimated value (greedy) if self.values["A"] > self.values["B"]: return "A" elif self.values["B"] > self.values["A"]: return "B" else: return random.choice(["A", "B"]) def update(self, action, reward): self.counts[action] += 1 n = self.counts[action] value = self.values[action] # Update estimated value using incremental average self.values[action] = value + (reward - value) / n # Rewards for actions rewards = {"A": 1, "B": 2} agent = SimpleAgent() for episode in range(10): action = agent.choose_action() reward = rewards[action] agent.update(action, reward) print(f"Episode {episode+1}: Action={action}, Reward={reward}, Estimated Values={agent.values}")
When to Use
Use reinforcement learning when you want a system to learn how to make a sequence of decisions by itself, especially when the best choice depends on past actions and future rewards. It is great for problems where you can't easily tell the system the right answer but can give feedback on how good its actions are.
Real-world examples include teaching robots to walk, training game-playing AI like chess or Go, optimizing recommendations, and controlling self-driving cars.
Key Points
- Agent learns by trial and error using rewards as feedback.
- Goal is to maximize total reward over time.
- Useful for decision-making problems with delayed rewards.
- Requires exploration of actions and learning from results.