What is the primary purpose of the torch-model-archiver tool in TorchServe?
Think about how TorchServe loads models for inference.
The torch-model-archiver packages the model and related files into a single archive (.mar) that TorchServe can load for serving predictions.
What will be the output message when running this command to register a model with TorchServe?
torchserve --start --model-store model_store --models mymodel=mymodel.mar
Assume the model store and .mar file exist and are correct.
If the model store directory and .mar file exist and are valid, TorchServe starts and registers the model successfully.
You have a PyTorch model that takes two inputs and returns a dictionary of outputs. Which handler type should you use in TorchServe to serve this model?
Default handlers expect specific input/output formats.
When the model input/output does not match default handlers, a custom handler subclassing BaseHandler is needed to process inputs and outputs properly.
Which configuration file and parameter should you modify to change the batch size for inference requests in TorchServe?
Batch size is usually set per model in a YAML config.
The model-config.yaml file defines model-specific parameters including max_batch_size for inference batching.
After running TorchServe with metrics enabled, you see this output snippet:
{"model_name": "mymodel", "inference_count": 1000, "average_latency": 25.3}What does the average_latency value represent?
Latency usually measures processing time per request.
The average_latency metric shows the average time in milliseconds TorchServe takes to process each inference request for the model.