0
0
Ai-awarenessConceptBeginner · 4 min read

What is Reinforcement Learning: Simple Explanation and Example

Reinforcement learning is a type of machine learning where an agent learns to make decisions by trying actions and receiving rewards or penalties. It learns the best actions to take in a situation to maximize its total reward over time.
⚙️

How It Works

Imagine teaching a dog new tricks. You give it a treat when it does something right and ignore or gently correct it when it does something wrong. Over time, the dog learns which actions get rewards and tries to do those more often. Reinforcement learning works the same way but with computer programs called agents.

The agent interacts with an environment by taking actions. After each action, it gets feedback in the form of a reward (good) or penalty (bad). The agent's goal is to learn a strategy, called a policy, that tells it the best action to take in each situation to get the most reward in the long run.

This trial-and-error learning helps the agent improve by exploring different actions and remembering which ones worked best before.

💻

Example

This example shows a simple agent learning to choose between two actions to get the highest reward.

python
import random

class SimpleAgent:
    def __init__(self):
        self.values = {"A": 0, "B": 0}  # Estimated value of actions
        self.counts = {"A": 0, "B": 0}  # How many times each action was taken

    def choose_action(self):
        # Choose action with highest estimated value (greedy)
        if self.values["A"] > self.values["B"]:
            return "A"
        elif self.values["B"] > self.values["A"]:
            return "B"
        else:
            return random.choice(["A", "B"])

    def update(self, action, reward):
        self.counts[action] += 1
        n = self.counts[action]
        value = self.values[action]
        # Update estimated value using incremental average
        self.values[action] = value + (reward - value) / n

# Rewards for actions
rewards = {"A": 1, "B": 2}
agent = SimpleAgent()

for episode in range(10):
    action = agent.choose_action()
    reward = rewards[action]
    agent.update(action, reward)
    print(f"Episode {episode+1}: Action={action}, Reward={reward}, Estimated Values={agent.values}")
Output
Episode 1: Action=A, Reward=1, Estimated Values={'A': 1.0, 'B': 0} Episode 2: Action=B, Reward=2, Estimated Values={'A': 1.0, 'B': 2.0} Episode 3: Action=B, Reward=2, Estimated Values={'A': 1.0, 'B': 2.0} Episode 4: Action=B, Reward=2, Estimated Values={'A': 1.0, 'B': 2.0} Episode 5: Action=B, Reward=2, Estimated Values={'A': 1.0, 'B': 2.0} Episode 6: Action=B, Reward=2, Estimated Values={'A': 1.0, 'B': 2.0} Episode 7: Action=B, Reward=2, Estimated Values={'A': 1.0, 'B': 2.0} Episode 8: Action=B, Reward=2, Estimated Values={'A': 1.0, 'B': 2.0} Episode 9: Action=B, Reward=2, Estimated Values={'A': 1.0, 'B': 2.0} Episode 10: Action=B, Reward=2, Estimated Values={'A': 1.0, 'B': 2.0}
🎯

When to Use

Use reinforcement learning when you want a system to learn how to make a sequence of decisions by itself, especially when the best choice depends on past actions and future rewards. It is great for problems where you can't easily tell the system the right answer but can give feedback on how good its actions are.

Real-world examples include teaching robots to walk, training game-playing AI like chess or Go, optimizing recommendations, and controlling self-driving cars.

Key Points

  • Agent learns by trial and error using rewards as feedback.
  • Goal is to maximize total reward over time.
  • Useful for decision-making problems with delayed rewards.
  • Requires exploration of actions and learning from results.

Key Takeaways

Reinforcement learning teaches an agent to make decisions by rewarding good actions and penalizing bad ones.
The agent learns a policy to choose actions that maximize long-term rewards.
It is ideal for problems where feedback is delayed or the best action depends on future outcomes.
Exploration and learning from experience are essential parts of reinforcement learning.