What is Reinforcement Learning in Python: Simple Explanation and Example
rewards and actions. It is different from typical machine learning because it learns by trial and error, improving over time based on feedback.How It Works
Imagine teaching a dog new tricks by giving it treats when it does something right and ignoring or gently correcting it when it does something wrong. Reinforcement learning works similarly for computers. The computer tries different actions in an environment and learns which actions lead to the best results by receiving rewards or penalties.
In Python, this process involves an agent (the learner) interacting with an environment. The agent takes an action, the environment responds with a new state and a reward, and the agent uses this feedback to improve its future actions. Over time, the agent learns the best strategy to maximize its total reward.
Example
This example shows a simple reinforcement learning setup where an agent learns to choose actions to maximize rewards using a basic Q-learning approach.
import numpy as np # Define the environment states = [0, 1, 2] actions = [0, 1] # 0: stay, 1: move # Initialize Q-table with zeros Q = np.zeros((len(states), len(actions))) # Define rewards for state-action pairs rewards = np.array([[0, 1], [0, 0], [1, 0]]) # Learning parameters alpha = 0.5 # learning rate gamma = 0.9 # discount factor episodes = 10 for episode in range(episodes): state = 0 # start state for _ in range(5): # limit steps per episode # Choose action with highest Q value (greedy) action = np.argmax(Q[state]) # Get reward reward = rewards[state, action] # Next state (simple transition) next_state = (state + action) % len(states) # Update Q value Q[state, action] = Q[state, action] + alpha * (reward + gamma * np.max(Q[next_state]) - Q[state, action]) state = next_state print("Learned Q-table:") print(Q)
When to Use
Use reinforcement learning when you want a system to learn how to make decisions by itself through trial and error, especially when the best action depends on a sequence of steps. It is useful in games, robotics, recommendation systems, and any task where feedback is delayed or uncertain.
For example, teaching a robot to walk, training an AI to play chess, or optimizing delivery routes can benefit from reinforcement learning.
Key Points
- Reinforcement learning learns by rewards and penalties, not direct examples.
- It involves an agent interacting with an environment over time.
- Python can implement reinforcement learning using libraries or custom code.
- It is best for problems where decisions affect future outcomes.