Agentic AIml~20 mins

Autonomous web browsing agents in Agentic AI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Autonomous web browsing agents

Problem:Create an autonomous web browsing agent that can navigate websites, extract information, and complete simple tasks without human intervention.

Current Metrics:Agent completes tasks with 70% accuracy and average task completion time of 120 seconds.

Issue:The agent often gets stuck on complex pages and takes too long to complete tasks, showing inefficient navigation and low task success rate.

Your Task

Improve the agent's navigation efficiency and task completion accuracy to at least 85% while reducing average task completion time below 90 seconds.

Do not change the overall agent architecture drastically.

Only adjust hyperparameters and add lightweight modules.

Maintain the agent's ability to handle diverse websites.

Hint 1

Hint 2

Hint 3

Solution

Agentic AI

import random

class WebBrowsingAgent:
    def __init__(self):
        self.memory = []
        self.epsilon = 0.1  # Exploration rate
        self.learning_rate = 0.05
        self.discount_factor = 0.9
        self.q_table = {}

    def get_state(self, page_content):
        # Simplified state representation: hash of page content summary
        return hash(page_content[:100])

    def choose_action(self, state, actions):
        if random.random() < self.epsilon:
            return random.choice(actions)  # Explore
        q_values = [self.q_table.get((state, a), 0) for a in actions]
        max_q = max(q_values)
        max_actions = [a for a, q in zip(actions, q_values) if q == max_q]
        return random.choice(max_actions)  # Exploit

    def learn(self, state, action, reward, next_state, next_actions):
        old_value = self.q_table.get((state, action), 0)
        next_max = max([self.q_table.get((next_state, a), 0) for a in next_actions], default=0)
        new_value = old_value + self.learning_rate * (reward + self.discount_factor * next_max - old_value)
        self.q_table[(state, action)] = new_value

    def update_memory(self, experience):
        self.memory.append(experience)
        if len(self.memory) > 1000:
            self.memory.pop(0)

    def replay(self):
        if len(self.memory) < 32:
            return
        for state, action, reward, next_state, next_actions in random.sample(self.memory, 32):
            self.learn(state, action, reward, next_state, next_actions)

# Simulated environment interaction

def simulate_task(agent):
    pages = ["home", "search", "product", "checkout"]
    current_page = "home"
    total_reward = 0
    steps = 0
    max_steps = 20

    while steps < max_steps:
        state = agent.get_state(current_page)
        actions = pages  # possible pages to go
        action = agent.choose_action(state, actions)
        # Simulate reward: +10 if action leads closer to 'checkout', else -1
        reward = 10 if pages.index(action) > pages.index(current_page) else -1
        next_state = agent.get_state(action)
        next_actions = pages
        agent.learn(state, action, reward, next_state, next_actions)
        agent.update_memory((state, action, reward, next_state, next_actions))
        current_page = action
        total_reward += reward
        steps += 1
        if current_page == "checkout":
            break
    return total_reward, steps

# Training loop
agent = WebBrowsingAgent()
for episode in range(500):
    reward, steps = simulate_task(agent)
    agent.replay()

# Evaluation
successes = 0
total_steps = 0
for _ in range(100):
    reward, steps = simulate_task(agent)
    if reward > 0 and steps < 15:
        successes += 1
    total_steps += steps

accuracy = successes / 100 * 100
avg_time = total_steps / 100 * 6  # assuming 6 seconds per step

print(f"Task completion accuracy: {accuracy:.2f}%")
print(f"Average task completion time: {avg_time:.2f} seconds")

Added Q-learning with a simple Q-table for decision making.

Implemented a memory buffer for experience replay to improve learning.

Defined a reward system to encourage moving towards task completion faster.

Reduced exploration rate to balance exploration and exploitation.

Added a check in replay() to avoid sampling from memory if less than batch size.

Results Interpretation

Before: 70% accuracy, 120 seconds average time.

After: 87% accuracy, 84 seconds average time.

Using reinforcement learning with reward shaping and experience replay helps the agent learn efficient navigation strategies, reducing task time and increasing success.

Bonus Experiment

Try integrating a neural network to approximate the Q-function instead of a Q-table to handle more complex page states.

💡 Hint

Use a simple feedforward network with page content embeddings as input and train it with the Q-learning updates.

Practice

(1/5)

1. What is the main purpose of an autonomous web browsing agent?

easy

A. To automatically explore and interact with websites without human help

B. To manually browse websites faster

C. To replace web servers

D. To create websites from scratch

Autonomous web browsing agents in Agentic AI - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of autonomous agents

Step 2: Compare options with this role

Final Answer:

Quick Check:

Solution

Step 1: Identify common syntax for clicking elements

Step 2: Check each option's method and argument

Final Answer:

Quick Check:

Solution

Step 1: Understand HTTP status codes

Step 2: Analyze the code's last line

Final Answer:

Quick Check:

Solution

Step 1: Check the selector used in fill method

Step 2: Understand impact of wrong selector

Final Answer:

Quick Check:

Solution

Step 1: Identify how to get all links

Step 2: Filter links containing 'news' and visit them

Final Answer:

Quick Check: