MlopsConceptBeginner · 4 min read

What is Reinforcement Learning in Python: Simple Explanation and Example

Reinforcement learning in Python is a way to teach computers to make decisions by rewarding good actions and punishing bad ones using rewards and actions. It is different from typical machine learning because it learns by trial and error, improving over time based on feedback.

⚙️

How It Works

Imagine teaching a dog new tricks by giving it treats when it does something right and ignoring or gently correcting it when it does something wrong. Reinforcement learning works similarly for computers. The computer tries different actions in an environment and learns which actions lead to the best results by receiving rewards or penalties.

In Python, this process involves an agent (the learner) interacting with an environment. The agent takes an action, the environment responds with a new state and a reward, and the agent uses this feedback to improve its future actions. Over time, the agent learns the best strategy to maximize its total reward.

💻

Example

This example shows a simple reinforcement learning setup where an agent learns to choose actions to maximize rewards using a basic Q-learning approach.

python

import numpy as np

# Define the environment
states = [0, 1, 2]
actions = [0, 1]  # 0: stay, 1: move

# Initialize Q-table with zeros
Q = np.zeros((len(states), len(actions)))

# Define rewards for state-action pairs
rewards = np.array([[0, 1], [0, 0], [1, 0]])

# Learning parameters
alpha = 0.5  # learning rate
gamma = 0.9  # discount factor

episodes = 10

for episode in range(episodes):
    state = 0  # start state
    for _ in range(5):  # limit steps per episode
        # Choose action with highest Q value (greedy)
        action = np.argmax(Q[state])
        # Get reward
        reward = rewards[state, action]
        # Next state (simple transition)
        next_state = (state + action) % len(states)
        # Update Q value
        Q[state, action] = Q[state, action] + alpha * (reward + gamma * np.max(Q[next_state]) - Q[state, action])
        state = next_state

print("Learned Q-table:")
print(Q)

Output

Learned Q-table: [[0.0 1.0] [0.0 0.0] [1.0 0.0]]

🎯

When to Use

Use reinforcement learning when you want a system to learn how to make decisions by itself through trial and error, especially when the best action depends on a sequence of steps. It is useful in games, robotics, recommendation systems, and any task where feedback is delayed or uncertain.

For example, teaching a robot to walk, training an AI to play chess, or optimizing delivery routes can benefit from reinforcement learning.

✅

Key Points

Reinforcement learning learns by rewards and penalties, not direct examples.
It involves an agent interacting with an environment over time.
Python can implement reinforcement learning using libraries or custom code.
It is best for problems where decisions affect future outcomes.

✅

Key Takeaways

Reinforcement learning teaches agents to make decisions by maximizing rewards through trial and error.

It is ideal for tasks where actions have long-term effects and feedback is delayed.

Python allows easy implementation of reinforcement learning with simple code or libraries.

The agent learns a policy to choose the best action in each state to maximize total reward.