Bird
Raised Fist0
Prompt Engineering / GenAIml~8 mins

Agent architecture (observe, think, act) in Prompt Engineering / GenAI - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Agent architecture (observe, think, act)
Which metric matters for Agent Architecture and WHY

For agent architectures that observe, think, and act, the key metrics depend on the task the agent performs. Common metrics include accuracy for classification tasks, reward or return in reinforcement learning, and response time for real-time actions. These metrics show how well the agent understands its environment (observe), makes decisions (think), and executes actions (act).

For example, in a navigation agent, success rate (reaching the goal) and steps taken matter. In a chatbot agent, response relevance and user satisfaction are important. Choosing the right metric helps us know if the agent is learning and acting effectively.

Confusion Matrix or Equivalent Visualization

When the agent's task is classification, a confusion matrix helps us see how well it predicts classes:

      | Predicted Positive | Predicted Negative |
      |--------------------|--------------------|
      | True Positive (TP)  | False Positive (FP) |
      | False Negative (FN) | True Negative (TN)  |
    

For example, if an agent detects obstacles, TP means correctly spotting obstacles, FP means false alarms, FN means missed obstacles, and TN means correctly ignoring safe areas.

For other tasks like reinforcement learning, we visualize reward over time or policy improvement graphs instead.

Precision vs Recall Tradeoff with Concrete Examples

Precision and recall show different strengths of the agent's decisions:

  • Precision = How many chosen actions were correct? (TP / (TP + FP))
  • Recall = How many correct actions were chosen? (TP / (TP + FN))

Example 1: A security agent that detects intruders should have high recall to catch all threats, even if it means some false alarms (lower precision).

Example 2: A customer support chatbot should have high precision to avoid giving wrong answers, even if it misses some questions (lower recall).

Balancing precision and recall depends on what mistakes cost more in the agent's task.

What "Good" vs "Bad" Metric Values Look Like for Agent Architecture

Good metrics:

  • High accuracy or success rate (e.g., >90%) showing the agent acts correctly most of the time.
  • Balanced precision and recall, avoiding too many false alarms or misses.
  • Consistent improvement in reward or task completion over training.
  • Low response time for real-time actions.

Bad metrics:

  • Low accuracy or success rate (e.g., <50%) meaning the agent often fails.
  • Very high precision but very low recall, or vice versa, indicating poor balance.
  • Reward or performance stuck or decreasing during training.
  • Slow or delayed actions causing poor user experience.
Common Metrics Pitfalls
  • Accuracy paradox: High accuracy can be misleading if data is imbalanced. For example, if 95% of observations are safe, an agent always acting safe gets 95% accuracy but misses dangers.
  • Data leakage: Using future information in training can inflate metrics but fail in real use.
  • Overfitting indicators: Very high training metrics but poor test metrics mean the agent memorizes instead of learning.
  • Ignoring latency: Good decisions are useless if the agent acts too slowly.
Self-Check Question

Your agent has 98% accuracy but only 12% recall on detecting fraud. Is it good for production? Why or why not?

Answer: No, it is not good. The agent misses 88% of fraud cases (low recall), which is dangerous. High accuracy is misleading because fraud is rare. The agent needs better recall to catch more fraud.

Key Result
For agent architectures, balanced precision and recall with task-specific success rates best show effective observe-think-act performance.

Practice

(1/5)
1. Which of the following best describes the observe step in an agent architecture?
easy
A. Collecting information from the environment
B. Making decisions based on data
C. Performing actions to change the environment
D. Storing past experiences for learning

Solution

  1. Step 1: Understand the role of observation

    The observe step is about gathering data or signals from the environment around the agent.
  2. Step 2: Differentiate from other steps

    Thinking is about decision-making, and acting is about doing something. Observation is just about sensing.
  3. Final Answer:

    Collecting information from the environment -> Option A
  4. Quick Check:

    Observe = Collect data [OK]
Hint: Observe means sensing or collecting data first [OK]
Common Mistakes:
  • Confusing observe with think or act
  • Thinking observe means acting
  • Mixing observe with storing data
2. Which of the following is the correct order of steps in a simple agent architecture?
easy
A. Act, Think, Observe
B. Think, Observe, Act
C. Think, Act, Observe
D. Observe, Think, Act

Solution

  1. Step 1: Recall the agent cycle

    The agent first observes the environment, then thinks (decides), and finally acts.
  2. Step 2: Match the sequence

    Only the sequence Observe, Think, Act matches the correct order of operations.
  3. Final Answer:

    Observe, Think, Act -> Option D
  4. Quick Check:

    Order = Observe, Think, Act [OK]
Hint: Remember: Sense first, then decide, then do [OK]
Common Mistakes:
  • Mixing up the order of steps
  • Starting with act before observe
  • Confusing think and observe order
3. Consider this simple Python agent code snippet:
class Agent:
    def observe(self, data):
        self.data = data
    def think(self):
        return self.data * 2
    def act(self, result):
        print(f"Action: {result}")

agent = Agent()
agent.observe(5)
result = agent.think()
agent.act(result)

What will be printed when this code runs?
medium
A. No output, error occurs
B. Action: 10
C. Action: 25
D. Action: 5

Solution

  1. Step 1: Follow the observe method

    The agent observes the value 5 and stores it in self.data.
  2. Step 2: Follow the think method

    The think method returns self.data * 2, which is 5 * 2 = 10.
  3. Step 3: Follow the act method

    The act method prints "Action: 10" using the result from think.
  4. Final Answer:

    Action: 10 -> Option B
  5. Quick Check:

    5 * 2 = 10 printed [OK]
Hint: Multiply observed data by 2, then print [OK]
Common Mistakes:
  • Confusing observe data with result
  • Forgetting to multiply by 2
  • Expecting no output or error
4. This agent code has a bug:
class Agent:
    def observe(self, data):
        self.data = data
    def think(self):
        return self.data + 1
    def act(self, result):
        print(f"Action: {result}")

agent = Agent()
result = agent.think()
agent.act(result)

What is the error and how to fix it?
medium
A. Error: self.data not set before think; fix by calling observe first
B. Error: act method missing return; fix by adding return statement
C. Error: observe method has wrong parameter; fix by renaming parameter
D. No error; code runs fine

Solution

  1. Step 1: Identify missing observe call

    The code calls think before observe, so self.data is not set.
  2. Step 2: Understand consequence

    Calling think tries to use self.data which does not exist, causing an error.
  3. Step 3: Fix by calling observe first

    Call agent.observe(some_value) before think to set self.data properly.
  4. Final Answer:

    Error: self.data not set before think; fix by calling observe first -> Option A
  5. Quick Check:

    Observe must run before think [OK]
Hint: Always observe before think to set data [OK]
Common Mistakes:
  • Ignoring the missing observe call
  • Thinking act needs return
  • Confusing parameter names
5. You want to build an agent that observes temperature, thinks if it's too hot (>30°C), and acts by turning on a fan. Which code snippet correctly implements the think method?
hard
A. def think(self): return self.data == 30
B. def think(self): if self.data < 30: return True else: return False
C. def think(self): return self.data > 30
D. def think(self): return self.data * 30

Solution

  1. Step 1: Understand the condition for action

    The agent should act if temperature is greater than 30°C, so think returns True if data > 30.
  2. Step 2: Check each option

    def think(self): return self.data > 30 returns True if data > 30, matching the requirement. Others do not correctly check this condition.
  3. Final Answer:

    def think(self): return self.data > 30 -> Option C
  4. Quick Check:

    Think returns True if hot (>30) [OK]
Hint: Think returns True if temperature > 30 [OK]
Common Mistakes:
  • Using wrong comparison operators
  • Returning True for less than 30
  • Multiplying data instead of comparing