Ml-pythonHow-ToBeginner · 4 min read

How to Use TorchServe for Serving PyTorch Models

To use TorchServe, first package your PyTorch model into a .mar archive using torch-model-archiver. Then start the server with torchserve and deploy the model for inference via REST API calls.

📐

Syntax

Using TorchServe involves three main commands:

torch-model-archiver: Packages your PyTorch model and handler into a .mar file.
torchserve: Starts the model server with the packaged model.
curl or HTTP client: Sends inference requests to the server.

Basic syntax:

torch-model-archiver --model-name <name> --version <version> --serialized-file <model_path> --handler <handler_file> --export-path <export_dir>
torchserve --start --model-store <export_dir> --models <name>=<name>.mar
curl -X POST http://127.0.0.1:8080/predictions/<name> -T <input_data>

bash

torch-model-archiver --model-name mymodel --version 1.0 --serialized-file model.pt --handler image_classifier --export-path model_store
torchserve --start --model-store model_store --models mymodel=mymodel.mar
curl -X POST http://127.0.0.1:8080/predictions/mymodel -T input.jpg

💻

Example

This example shows how to package a simple PyTorch model, start TorchServe, and send an inference request.

python

import torch
import torch.nn as nn

# Define a simple model
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 2)
    def forward(self, x):
        return self.linear(x)

# Save the model
model = SimpleModel()
torch.save(model.state_dict(), 'model.pt')

# Package the model (run in terminal):
# torch-model-archiver --model-name simplemodel --version 1.0 --serialized-file model.pt --handler image_classifier --export-path model_store

# Start TorchServe (run in terminal):
# torchserve --start --model-store model_store --models simplemodel=simplemodel.mar

# Send inference request (run in terminal):
# curl -X POST http://127.0.0.1:8080/predictions/simplemodel -T input.jpg

⚠️

Common Pitfalls

Not packaging the model correctly with torch-model-archiver causes loading errors.
Forgetting to start TorchServe before sending requests leads to connection failures.
Using incompatible handler files or missing dependencies can cause inference errors.
Incorrect input format in requests results in bad predictions or errors.

Always test your handler and model locally before packaging.

bash

## Wrong: Missing handler file
torch-model-archiver --model-name mymodel --serialized-file model.pt --export-path model_store

## Right: Include handler
torch-model-archiver --model-name mymodel --serialized-file model.pt --handler image_classifier --export-path model_store

📊

Quick Reference

Command	Purpose	Example
torch-model-archiver	Package model into .mar file	torch-model-archiver --model-name mymodel --version 1.0 --serialized-file model.pt --handler image_classifier --export-path model_store
torchserve	Start model server	torchserve --start --model-store model_store --models mymodel=mymodel.mar
curl POST	Send inference request	curl -X POST http://127.0.0.1:8080/predictions/mymodel -T input.jpg

✅

Key Takeaways

Package your PyTorch model into a .mar file using torch-model-archiver before serving.

Start TorchServe with the model store and specify models to load for inference.

Send inference requests via REST API to the running TorchServe server.

Ensure your handler file matches your model type and input format.

Test your model and handler locally to avoid common deployment errors.