Complete the code to select the serving architecture type.
serving_architecture = "[1]"
The streaming architecture is commonly used for real-time serving, which affects latency and cost.
Complete the code to set the latency threshold for serving.
latency_threshold_ms = [1]100 milliseconds is a common low latency target for real-time serving.
Fix the error in the code to calculate cost per request.
cost_per_request = total_cost / [1]Cost per request is calculated by dividing total cost by total number of requests served.
Fill both blanks to create a dictionary mapping architecture to latency.
latency_map = {"batch": [1], "streaming": [2]Batch serving typically has latency around 1000 ms, streaming around 100 ms.
Fill all three blanks to create a cost dictionary with architecture, latency, and cost per request.
cost_info = {"architecture": "[1]", "latency_ms": [2], "cost_per_request_usd": [3]Streaming architecture with 100 ms latency and cost per request of 0.0005 USD is typical for low-latency serving.