Practice

(1/5)

1. What does latency measure when benchmarking an AI model?

easy

A. The cost to train the model

B. The amount of memory the model uses

C. The accuracy of the model's predictions

D. The time it takes for the model to respond

Solution

Step 1: Understand latency in AI benchmarking
Latency refers to how long a model takes to give an answer after receiving input.
Step 2: Differentiate latency from other metrics
Memory usage, accuracy, and training cost are different metrics; latency is about response time.
Final Answer:
The time it takes for the model to respond -> Option D
Quick Check:
Latency = response time [OK]

Hint: Latency means response speed, not memory or cost [OK]

Common Mistakes:

Confusing latency with accuracy
Thinking latency measures memory use
Mixing latency with training cost

2. Which Python code snippet correctly measures latency of a model's prediction function model.predict()?

easy

A. start = time.time(); model.predict(); end = time.time(); latency = end - start

B. latency = model.predict().time()

C. latency = time.predict(model)

D. latency = model.time() - predict.time()

Solution

Step 1: Identify correct timing method in Python
Using time.time() before and after calling model.predict() measures elapsed time correctly.
Step 2: Check incorrect options for syntax errors
Options A, B, and D use invalid method calls or wrong order, so they won't work.
Final Answer:
start = time.time(); model.predict(); end = time.time(); latency = end - start -> Option A
Quick Check:
Use time.time() before and after call [OK]

Hint: Use time.time() before and after prediction call [OK]

Common Mistakes:

Calling non-existent methods like predict.time()
Subtracting wrong attributes
Not capturing time before and after prediction

3. Given this code measuring latency and cost, what is the printed output?

import time

start = time.time()
model_response = model.predict(input_data)
end = time.time()
latency = end - start
cost = latency * 0.05  # cost per second
print(round(latency, 2), round(cost, 3))

If model.predict() takes 0.24 seconds, what prints?

medium

A. 0.24 0.012

B. 0.24 0.12

C. 0.24 0.0012

D. 0.24 0.024

Solution

Step 1: Calculate latency and cost
Latency is 0.24 seconds. Cost = latency * 0.05 = 0.24 * 0.05 = 0.012.
Step 2: Round values as printed
Latency rounded to 2 decimals is 0.24. Cost rounded to 3 decimals is 0.012.
Final Answer:
0.24 0.012 -> Option A
Quick Check:
Cost = latency * 0.05 = 0.012 [OK]

Hint: Multiply latency by cost rate, then round [OK]

Common Mistakes:

Multiplying cost by 10 or 100 by mistake
Rounding cost incorrectly
Confusing latency and cost values

4. This code tries to measure latency but gives wrong results. What is the bug?

import time
start = time.time()
model.predict(input_data)
latency = time.time() - start
print('Latency:', latency)

medium

A. The model.predict call is missing parentheses

B. The code does not import the model

C. Latency is measured correctly; no bug

D. Latency should be measured before calling model.predict

Solution

Step 1: Check timing logic
The code records time before and after model.predict(input_data), then subtracts to get latency.
Step 2: Verify correctness of measurement
This is the correct way to measure latency; parentheses are present and timing is after call.
Final Answer:
Latency is measured correctly; no bug -> Option C
Quick Check:
Start time before, end time after call [OK]

Hint: Latency = end time minus start time around call [OK]

Common Mistakes:

Measuring time before call only
Forgetting parentheses on function call
Measuring latency after print statement

5. You want to compare two AI models for latency and cost. Model A takes 0.3 seconds per prediction and costs $0.04 per second. Model B takes 0.25 seconds but costs $0.06 per second. Which model is cheaper per prediction and which is faster?

hard

A. Model A is cheaper and faster; Model B is slower and more expensive

B. Model A is cheaper and slower; Model B is faster and more expensive

C. Model B is cheaper and slower; Model A is faster and more expensive

D. Model B is cheaper and faster; Model A is slower and more expensive

Solution

Step 1: Calculate cost per prediction for each model
Model A cost = 0.3 * 0.04 = $0.012; Model B cost = 0.25 * 0.06 = $0.015.
Step 2: Compare latency and cost
Model A is cheaper ($0.012 < $0.015) but slower (0.3s > 0.25s). Model B is faster but more expensive.
Final Answer:
Model A is cheaper and slower; Model B is faster and more expensive -> Option B
Quick Check:
Cost = latency * rate; compare values [OK]

Hint: Multiply latency by cost rate to compare total cost [OK]

Common Mistakes:

Ignoring cost per second rate
Mixing up which model is faster
Calculating cost incorrectly

Latency and cost benchmarking in Agentic AI - Model Pipeline Trace

Start learning this pattern below

Practice

Solution

Step 1: Understand latency in AI benchmarking

Step 2: Differentiate latency from other metrics

Final Answer:

Quick Check:

Solution

Step 1: Identify correct timing method in Python

Step 2: Check incorrect options for syntax errors

Final Answer:

Quick Check:

Solution

Step 1: Calculate latency and cost

Step 2: Round values as printed

Final Answer:

Quick Check:

Solution

Step 1: Check timing logic

Step 2: Verify correctness of measurement

Final Answer:

Quick Check:

Solution

Step 1: Calculate cost per prediction for each model

Step 2: Compare latency and cost

Final Answer:

Quick Check: