Bird
Raised Fist0
Agentic AIml~10 mins

Latency and cost benchmarking in Agentic AI - Interactive Code Practice

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to measure the latency of a function call.

Agentic AI
import time
start = time.[1]()
result = my_function()
end = time.perf_counter()
latency = end - start
print(f"Latency: {latency} seconds")
Drag options to blanks, or click blank then click option'
Aperf_counter
Bsleep
Ctime
Dprocess_time
Attempts:
3 left
💡 Hint
Common Mistakes
Using time.sleep() instead of a timer function.
Using time.time() which has lower resolution.
2fill in blank
medium

Complete the code to calculate the average cost per API call given total cost and number of calls.

Agentic AI
total_cost = 50.0  # dollars
num_calls = 200
average_cost = total_cost [1] num_calls
print(f"Average cost per call: ${average_cost:.4f}")
Drag options to blanks, or click blank then click option'
A/
B*
C+
D-
Attempts:
3 left
💡 Hint
Common Mistakes
Multiplying total cost by number of calls.
Adding or subtracting instead of dividing.
3fill in blank
hard

Fix the error in the code to correctly record the latency of multiple runs and compute the average latency.

Agentic AI
import time
latencies = []
for _ in range(5):
    start = time.perf_counter()
    run_task()
    end = time.[1]()
    latencies.append(end - start)
avg_latency = sum(latencies) / len(latencies)
print(f"Average latency: {avg_latency:.5f} seconds")
Drag options to blanks, or click blank then click option'
Asleep
Btime
Cprocess_time
Dperf_counter
Attempts:
3 left
💡 Hint
Common Mistakes
Using time.time() for end but perf_counter() for start.
Using sleep instead of a timer.
4fill in blank
hard

Fill both blanks to create a dictionary of API call latencies filtered by calls longer than 0.1 seconds.

Agentic AI
latency_data = {call_id: [1] for call_id, [2] in api_calls.items() if latency > 0.1}
Drag options to blanks, or click blank then click option'
Alatency
Bduration
Attempts:
3 left
💡 Hint
Common Mistakes
Using the same variable name for both blanks.
Using undefined variable names.
5fill in blank
hard

Fill all three blanks to create a filtered dictionary of call costs where cost is above 0.05 dollars and keys are uppercase.

Agentic AI
filtered_costs = [1]([2]: cost for [3], cost in call_costs.items() if cost > 0.05)
Drag options to blanks, or click blank then click option'
Adict
Bk.upper()
Ck
Dv
Attempts:
3 left
💡 Hint
Common Mistakes
Not converting keys to uppercase.
Using wrong variable names in comprehension.

Practice

(1/5)
1. What does latency measure when benchmarking an AI model?
easy
A. The cost to train the model
B. The amount of memory the model uses
C. The accuracy of the model's predictions
D. The time it takes for the model to respond

Solution

  1. Step 1: Understand latency in AI benchmarking

    Latency refers to how long a model takes to give an answer after receiving input.
  2. Step 2: Differentiate latency from other metrics

    Memory usage, accuracy, and training cost are different metrics; latency is about response time.
  3. Final Answer:

    The time it takes for the model to respond -> Option D
  4. Quick Check:

    Latency = response time [OK]
Hint: Latency means response speed, not memory or cost [OK]
Common Mistakes:
  • Confusing latency with accuracy
  • Thinking latency measures memory use
  • Mixing latency with training cost
2. Which Python code snippet correctly measures latency of a model's prediction function model.predict()?
easy
A. start = time.time(); model.predict(); end = time.time(); latency = end - start
B. latency = model.predict().time()
C. latency = time.predict(model)
D. latency = model.time() - predict.time()

Solution

  1. Step 1: Identify correct timing method in Python

    Using time.time() before and after calling model.predict() measures elapsed time correctly.
  2. Step 2: Check incorrect options for syntax errors

    Options A, B, and D use invalid method calls or wrong order, so they won't work.
  3. Final Answer:

    start = time.time(); model.predict(); end = time.time(); latency = end - start -> Option A
  4. Quick Check:

    Use time.time() before and after call [OK]
Hint: Use time.time() before and after prediction call [OK]
Common Mistakes:
  • Calling non-existent methods like predict.time()
  • Subtracting wrong attributes
  • Not capturing time before and after prediction
3. Given this code measuring latency and cost, what is the printed output?
import time

start = time.time()
model_response = model.predict(input_data)
end = time.time()
latency = end - start
cost = latency * 0.05  # cost per second
print(round(latency, 2), round(cost, 3))
If model.predict() takes 0.24 seconds, what prints?
medium
A. 0.24 0.012
B. 0.24 0.12
C. 0.24 0.0012
D. 0.24 0.024

Solution

  1. Step 1: Calculate latency and cost

    Latency is 0.24 seconds. Cost = latency * 0.05 = 0.24 * 0.05 = 0.012.
  2. Step 2: Round values as printed

    Latency rounded to 2 decimals is 0.24. Cost rounded to 3 decimals is 0.012.
  3. Final Answer:

    0.24 0.012 -> Option A
  4. Quick Check:

    Cost = latency * 0.05 = 0.012 [OK]
Hint: Multiply latency by cost rate, then round [OK]
Common Mistakes:
  • Multiplying cost by 10 or 100 by mistake
  • Rounding cost incorrectly
  • Confusing latency and cost values
4. This code tries to measure latency but gives wrong results. What is the bug?
import time
start = time.time()
model.predict(input_data)
latency = time.time() - start
print('Latency:', latency)
medium
A. The model.predict call is missing parentheses
B. The code does not import the model
C. Latency is measured correctly; no bug
D. Latency should be measured before calling model.predict

Solution

  1. Step 1: Check timing logic

    The code records time before and after model.predict(input_data), then subtracts to get latency.
  2. Step 2: Verify correctness of measurement

    This is the correct way to measure latency; parentheses are present and timing is after call.
  3. Final Answer:

    Latency is measured correctly; no bug -> Option C
  4. Quick Check:

    Start time before, end time after call [OK]
Hint: Latency = end time minus start time around call [OK]
Common Mistakes:
  • Measuring time before call only
  • Forgetting parentheses on function call
  • Measuring latency after print statement
5. You want to compare two AI models for latency and cost. Model A takes 0.3 seconds per prediction and costs $0.04 per second. Model B takes 0.25 seconds but costs $0.06 per second. Which model is cheaper per prediction and which is faster?
hard
A. Model A is cheaper and faster; Model B is slower and more expensive
B. Model A is cheaper and slower; Model B is faster and more expensive
C. Model B is cheaper and slower; Model A is faster and more expensive
D. Model B is cheaper and faster; Model A is slower and more expensive

Solution

  1. Step 1: Calculate cost per prediction for each model

    Model A cost = 0.3 * 0.04 = $0.012; Model B cost = 0.25 * 0.06 = $0.015.
  2. Step 2: Compare latency and cost

    Model A is cheaper ($0.012 < $0.015) but slower (0.3s > 0.25s). Model B is faster but more expensive.
  3. Final Answer:

    Model A is cheaper and slower; Model B is faster and more expensive -> Option B
  4. Quick Check:

    Cost = latency * rate; compare values [OK]
Hint: Multiply latency by cost rate to compare total cost [OK]
Common Mistakes:
  • Ignoring cost per second rate
  • Mixing up which model is faster
  • Calculating cost incorrectly