Bird
Raised Fist0
Agentic AIml~8 mins

Building custom tools in Agentic AI - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Building custom tools
Which metric matters for Building custom tools and WHY

When building custom AI tools, the key metric depends on the tool's goal. For example, if the tool classifies text, accuracy shows overall correctness. But if the tool detects rare events, recall is vital to catch as many true cases as possible. If the tool must avoid false alarms, precision matters more. Choosing the right metric helps you know if your tool works well for its purpose.

Confusion matrix example
      | Predicted Positive | Predicted Negative |
      |--------------------|--------------------|
      | True Positive (TP): 40 | False Negative (FN): 10 |
      | False Positive (FP): 5 | True Negative (TN): 45 |

      Total samples = 40 + 10 + 5 + 45 = 100

      Precision = TP / (TP + FP) = 40 / (40 + 5) = 0.89
      Recall = TP / (TP + FN) = 40 / (40 + 10) = 0.80
      Accuracy = (TP + TN) / Total = (40 + 45) / 100 = 0.85
      F1 Score = 2 * (Precision * Recall) / (Precision + Recall) = 2 * (0.89 * 0.80) / (0.89 + 0.80) ≈ 0.84
    
Precision vs Recall tradeoff with examples

Imagine building a custom tool to detect spam emails:

  • High Precision: The tool marks emails as spam only when very sure. Few good emails get wrongly marked (few false alarms). But it might miss some spam (lower recall).
  • High Recall: The tool catches almost all spam emails. But it might mark some good emails as spam (more false alarms).

Choosing between precision and recall depends on what is worse: missing spam or wrongly blocking good emails.

What "good" vs "bad" metric values look like for Building custom tools

Good metrics:

  • Accuracy above 85% for balanced tasks
  • Precision and recall both above 80% for critical detection tools
  • F1 score close to 1 means balanced and strong performance

Bad metrics:

  • Accuracy near 50% on balanced data means guessing
  • Precision very low (e.g., 30%) means many false alarms
  • Recall very low (e.g., 20%) means many misses
  • Big gap between precision and recall shows imbalance
Common pitfalls when evaluating custom tools
  • Accuracy paradox: High accuracy can be misleading if data is imbalanced (e.g., 95% accuracy but only detecting the majority class).
  • Data leakage: Using future or test data during training inflates metrics falsely.
  • Overfitting: Very high training accuracy but low test accuracy means the tool learned noise, not real patterns.
  • Ignoring metric tradeoffs: Focusing only on accuracy without considering precision or recall can hide problems.
Self-check question

Your custom tool has 98% accuracy but only 12% recall on detecting fraud cases. Is it good for production? Why or why not?

Answer: No, it is not good. The tool misses 88% of fraud cases (low recall), which is dangerous. High accuracy likely comes from many non-fraud cases being correctly identified, but the tool fails at its main goal: catching fraud.

Key Result
Choosing the right metric like precision or recall is key to knowing if your custom AI tool works well for its specific task.

Practice

(1/5)
1. What is the main purpose of building custom tools for an AI agent?
easy
A. To change the AI's language automatically
B. To add special skills that help the AI perform specific tasks
C. To reduce the size of the AI model
D. To make the AI run faster on any computer

Solution

  1. Step 1: Understand what custom tools do

    Custom tools add new abilities or skills to an AI, making it better at certain jobs.
  2. Step 2: Compare options to the purpose

    Only To add special skills that help the AI perform specific tasks talks about adding special skills, which matches the purpose of custom tools.
  3. Final Answer:

    To add special skills that help the AI perform specific tasks -> Option B
  4. Quick Check:

    Custom tools = add special skills [OK]
Hint: Custom tools add new skills to AI for tasks [OK]
Common Mistakes:
  • Thinking custom tools speed up AI generally
  • Confusing tool purpose with model size
  • Assuming tools change AI language automatically
2. Which of the following is the correct way to define a custom tool in Python for an AI agent?
easy
A. tool = Tool(name='search', func=search_function)
B. tool = Tool('search', func=search_function)
C. tool = Tool(description='Find info', func=search_function)
D. tool = Tool(name='search', description='Find info', func=search_function)

Solution

  1. Step 1: Recall required fields for a custom tool

    A custom tool needs a name, description, and a function to work properly.
  2. Step 2: Check which option includes all three

    Only tool = Tool(name='search', description='Find info', func=search_function) has name, description, and func parameters correctly set.
  3. Final Answer:

    tool = Tool(name='search', description='Find info', func=search_function) -> Option D
  4. Quick Check:

    Tool needs name, description, and func [OK]
Hint: Include name, description, and func when defining tools [OK]
Common Mistakes:
  • Omitting description or name
  • Passing parameters in wrong order
  • Using wrong parameter names
3. Given this Python code for a custom tool, what will be the output when calling tool.func('hello')?
def shout(text):
    return text.upper() + '!!!'
tool = Tool(name='shout', description='Make text loud', func=shout)
medium
A. 'HELLO!!!'
B. 'hello!!!'
C. 'hello'
D. Error: func is not callable

Solution

  1. Step 1: Understand the function behavior

    The function shout converts text to uppercase and adds three exclamation marks.
  2. Step 2: Apply the function to 'hello'

    Calling shout('hello') returns 'HELLO!!!'. Since tool.func points to shout, tool.func('hello') does the same.
  3. Final Answer:

    'HELLO!!!' -> Option A
  4. Quick Check:

    shout('hello') = 'HELLO!!!' [OK]
Hint: Check function logic and apply input to predict output [OK]
Common Mistakes:
  • Ignoring uppercase conversion
  • Missing exclamation marks
  • Assuming func is not callable
4. You wrote this custom tool but get an error when using it. What is the likely problem?
def add_numbers(a, b):
    return a + b
tool = Tool(name='adder', description='Add two numbers', func=add_numbers)
result = tool.func(5)
medium
A. Tool name must be unique
B. Function add_numbers should not return a value
C. Missing one argument when calling tool.func
D. Description is too short

Solution

  1. Step 1: Check function parameters

    add_numbers requires two inputs: a and b.
  2. Step 2: Check how tool.func is called

    tool.func(5) provides only one argument, causing an error for missing the second argument.
  3. Final Answer:

    Missing one argument when calling tool.func -> Option C
  4. Quick Check:

    Function needs 2 args, only 1 given [OK]
Hint: Match function parameters with call arguments [OK]
Common Mistakes:
  • Ignoring function argument count
  • Thinking description length causes error
  • Assuming tool name uniqueness causes runtime error
5. You want to build a custom tool that summarizes text by returning the first 10 words. Which code correctly defines this tool's function?
hard
A. def summarize(text): return ' '.join(text.split()[:10])
B. def summarize(text): return text[:10]
C. def summarize(text): return text.split()[-10:]
D. def summarize(text): return len(text.split())

Solution

  1. Step 1: Understand the goal of the function

    The function should return the first 10 words, not characters or last words.
  2. Step 2: Analyze each option

    def summarize(text): return ' '.join(text.split()[:10]) splits text into words and joins the first 10 words correctly. def summarize(text): return text[:10] returns first 10 characters, not words. def summarize(text): return text.split()[-10:] returns last 10 words. def summarize(text): return len(text.split()) returns word count, not summary.
  3. Final Answer:

    def summarize(text): return ' '.join(text.split()[:10]) -> Option A
  4. Quick Check:

    First 10 words = def summarize(text): return ' '.join(text.split()[:10]) [OK]
Hint: Split text and join first 10 words for summary [OK]
Common Mistakes:
  • Returning characters instead of words
  • Taking last words instead of first
  • Returning word count instead of summary