How to Use TorchServe for Serving PyTorch Models
To use
TorchServe, first package your PyTorch model into a .mar archive using torch-model-archiver. Then start the server with torchserve and deploy the model for inference via REST API calls.Syntax
Using TorchServe involves three main commands:
- torch-model-archiver: Packages your PyTorch model and handler into a
.marfile. - torchserve: Starts the model server with the packaged model.
- curl or HTTP client: Sends inference requests to the server.
Basic syntax:
torch-model-archiver --model-name <name> --version <version> --serialized-file <model_path> --handler <handler_file> --export-path <export_dir> torchserve --start --model-store <export_dir> --models <name>=<name>.mar curl -X POST http://127.0.0.1:8080/predictions/<name> -T <input_data>
bash
torch-model-archiver --model-name mymodel --version 1.0 --serialized-file model.pt --handler image_classifier --export-path model_store torchserve --start --model-store model_store --models mymodel=mymodel.mar curl -X POST http://127.0.0.1:8080/predictions/mymodel -T input.jpg
Example
This example shows how to package a simple PyTorch model, start TorchServe, and send an inference request.
python
import torch import torch.nn as nn # Define a simple model class SimpleModel(nn.Module): def __init__(self): super().__init__() self.linear = nn.Linear(10, 2) def forward(self, x): return self.linear(x) # Save the model model = SimpleModel() torch.save(model.state_dict(), 'model.pt') # Package the model (run in terminal): # torch-model-archiver --model-name simplemodel --version 1.0 --serialized-file model.pt --handler image_classifier --export-path model_store # Start TorchServe (run in terminal): # torchserve --start --model-store model_store --models simplemodel=simplemodel.mar # Send inference request (run in terminal): # curl -X POST http://127.0.0.1:8080/predictions/simplemodel -T input.jpg
Common Pitfalls
- Not packaging the model correctly with
torch-model-archivercauses loading errors. - Forgetting to start TorchServe before sending requests leads to connection failures.
- Using incompatible handler files or missing dependencies can cause inference errors.
- Incorrect input format in requests results in bad predictions or errors.
Always test your handler and model locally before packaging.
bash
## Wrong: Missing handler file torch-model-archiver --model-name mymodel --serialized-file model.pt --export-path model_store ## Right: Include handler torch-model-archiver --model-name mymodel --serialized-file model.pt --handler image_classifier --export-path model_store
Quick Reference
| Command | Purpose | Example |
|---|---|---|
| torch-model-archiver | Package model into .mar file | torch-model-archiver --model-name mymodel --version 1.0 --serialized-file model.pt --handler image_classifier --export-path model_store |
| torchserve | Start model server | torchserve --start --model-store model_store --models mymodel=mymodel.mar |
| curl POST | Send inference request | curl -X POST http://127.0.0.1:8080/predictions/mymodel -T input.jpg |
Key Takeaways
Package your PyTorch model into a .mar file using torch-model-archiver before serving.
Start TorchServe with the model store and specify models to load for inference.
Send inference requests via REST API to the running TorchServe server.
Ensure your handler file matches your model type and input format.
Test your model and handler locally to avoid common deployment errors.