Model Pipeline - Latency and cost benchmarking
This pipeline measures how fast and how expensive an AI model runs. It helps us understand the time delay (latency) and the cost to use the model for predictions.
Jump into concepts and practice - no test required
This pipeline measures how fast and how expensive an AI model runs. It helps us understand the time delay (latency) and the cost to use the model for predictions.
Latency and cost benchmarking does not involve training, so no loss curve is shown.
| Epoch | Loss ↓ | Accuracy ↑ | Observation |
|---|---|---|---|
| 1 | N/A | N/A | No training; benchmarking measures inference only |
model.predict()?time.time() before and after calling model.predict() measures elapsed time correctly.import time start = time.time() model_response = model.predict(input_data) end = time.time() latency = end - start cost = latency * 0.05 # cost per second print(round(latency, 2), round(cost, 3))If
model.predict() takes 0.24 seconds, what prints?import time
start = time.time()
model.predict(input_data)
latency = time.time() - start
print('Latency:', latency)
model.predict(input_data), then subtracts to get latency.