Introduction
Latency and cost benchmarking helps us understand how fast and how expensive a machine learning model or system is. This way, we can choose the best option for our needs.
Jump into concepts and practice - no test required
measure_latency(model, input_data) measure_cost(latency_seconds, cost_per_second)
latency = measure_latency(my_model, test_input)
cost = measure_cost(latency, cost_per_second=0.05)latency = measure_latency(chatbot_model, user_message)
cost = measure_cost(latency, cost_per_second=0.10)import time # Simulate a simple model function def simple_model(x): time.sleep(0.2) # Simulate processing delay return x * 2 # Function to measure latency def measure_latency(model, input_data): start = time.time() _ = model(input_data) end = time.time() return end - start # Function to estimate cost (simple example: cost per second of compute) def measure_cost(latency_seconds, cost_per_second=0.05): return latency_seconds * cost_per_second # Test input input_value = 10 # Measure latency latency = measure_latency(simple_model, input_value) # Estimate cost cost = measure_cost(latency) print(f"Latency: {latency:.3f} seconds") print(f"Estimated cost: ${cost:.4f}")
model.predict()?time.time() before and after calling model.predict() measures elapsed time correctly.import time start = time.time() model_response = model.predict(input_data) end = time.time() latency = end - start cost = latency * 0.05 # cost per second print(round(latency, 2), round(cost, 3))If
model.predict() takes 0.24 seconds, what prints?import time
start = time.time()
model.predict(input_data)
latency = time.time() - start
print('Latency:', latency)
model.predict(input_data), then subtracts to get latency.