When using tools or functions in AI, the key metric is accuracy of the function call results. This means how often the tool returns the correct or expected output. We also care about response time because slow calls can hurt user experience. For some tools, precision and recall matter if the tool filters or selects information, ensuring relevant results without missing important data.
Tool usage (function calling) in Prompt Engineering / GenAI - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
| Predicted Success | Predicted Failure |
|-------------------|-------------------|
| True Positive (TP) | False Positive (FP)|
| False Negative (FN)| True Negative (TN) |
TP: Function call succeeded and output was correct
FP: Function call succeeded but output was wrong
FN: Function call failed but should have succeeded
TN: Function call failed and was expected to fail
Metrics like precision = TP / (TP + FP) and recall = TP / (TP + FN) help measure how well the tool works.
If a tool is very precise, it means when it returns a result, it is usually correct. But it might miss some correct results (low recall). For example, a search tool that only returns very sure answers might miss some relevant info.
If a tool has high recall, it finds most correct results but might include wrong ones (low precision). For example, a tool that returns many possible answers but some are wrong.
Choosing precision or recall depends on the use case. For critical tasks, high recall avoids missing important info. For user-facing tools, high precision avoids confusing wrong results.
Good: Precision and recall above 90%, low error rate, fast response time under 1 second.
Bad: Precision or recall below 50%, many wrong outputs, slow or failed calls.
Example: A tool with 95% precision but 40% recall misses many correct outputs, which may be bad for completeness.
- Accuracy paradox: High overall accuracy can hide poor performance on rare but important cases.
- Data leakage: Testing tool on data it already saw inflates metrics.
- Overfitting: Tool works well on test data but fails in real use.
- Ignoring latency: Fast but inaccurate tools or slow but accurate tools may both be problematic.
Your tool usage model has 98% accuracy but only 12% recall on important function calls. Is it good for production? Why or why not?
Answer: No, it is not good. The low recall means the tool misses most important calls, even if overall accuracy looks high. This can cause failures in critical tasks.
Practice
Solution
Step 1: Understand function role in AI tools
Functions are blocks of code designed to perform specific tasks when called with inputs.Step 2: Identify the purpose of calling functions
Calling a function means using it to do a job, usually with parameters to guide the task.Final Answer:
To perform a specific task using given inputs -> Option AQuick Check:
Function call = perform task with inputs [OK]
- Thinking functions create data without inputs
- Confusing function calls with data storage
- Assuming functions only display info
generate_text with a parameter prompt in Python?Solution
Step 1: Recall Python function call syntax
In Python, functions are called by writing the function name followed by parentheses enclosing parameters.Step 2: Match syntax with options
generate_text(prompt) uses the correct syntax: function name followed by parentheses with parameter inside.Final Answer:
generate_text(prompt) -> Option DQuick Check:
Python function call = name(params) [OK]
- Using assignment (=) instead of call
- Adding extra keywords like 'call'
- Using wrong symbols like '->'
def add_numbers(a, b):
return a + b
result = add_numbers(3, 5)
print(result)What will be printed?
Solution
Step 1: Understand function behavior
The functionadd_numberstakes two inputs and returns their sum.Step 2: Calculate the sum of inputs 3 and 5
3 + 5 equals 8, so the function returns 8, which is stored inresult.Final Answer:
8 -> Option BQuick Check:
3 + 5 = 8 [OK]
- Concatenating numbers as strings (35)
- Expecting error due to parameters
- Ignoring return value (None)
def translate(text, language):
return f"{text} in {language}"
result = translate("Hello")
print(result)Solution
Step 1: Check function definition parameters
The functiontranslaterequires two parameters:textandlanguage.Step 2: Check function call arguments
The calltranslate("Hello")provides only one argument, missing the second required parameter.Final Answer:
Missing second argument 'language' in function call -> Option CQuick Check:
All parameters must be given when calling [OK]
- Ignoring missing arguments
- Assuming default values without definition
- Misreading function name or print syntax
summarize_text that takes a long text and a number max_length to limit summary size. Which call correctly uses this function to summarize article with a max length of 100?Solution
Step 1: Understand function parameters and calling conventions
The function expects two inputs: the text and the max_length number. Keyword arguments can be used to specify parameters by name.Step 2: Identify correct syntax for positional and keyword arguments
summary = summarize_text(article, max_length=100) correctly passesarticleas the first argument and usesmax_length=100as a keyword argument.Final Answer:
summary = summarize_text(article, max_length=100) -> Option AQuick Check:
Positional first, then keyword args [OK]
- Placing keyword argument before positional
- Missing commas between arguments
- Adding extra unexpected arguments
