TorchServe helps you easily turn your PyTorch models into web services. It makes sharing and using models simple without extra coding.
TorchServe setup in PyTorch
torch-model-archiver --model-name <model_name> --version <version> --serialized-file <model_file> --handler <handler_file> --extra-files <extra_files> torchserve --start --model-store <model_store_dir> --models <model_name>=<model_name>.mar
The torch-model-archiver command packages your model into a .mar file for TorchServe.
The torchserve command starts the server and loads your packaged model.
torch-model-archiver --model-name resnet18 --version 1.0 --serialized-file resnet18.pth --handler image_classifier --extra-files index_to_name.json
torchserve --start --model-store model_store --models resnet18=resnet18.martorch-model-archiver --model-name mymodel --version 2.1 --serialized-file model.pt --handler custom_handler.py
torchserve --start --model-store model_store --models mymodel=mymodel.marThis code saves a pretrained ResNet18 model, shows how to package and start TorchServe (commands as comments), and runs a local prediction to demonstrate expected output.
import torch from torchvision import models, transforms from PIL import Image import requests import json # Step 1: Download and save a pretrained model model = models.resnet18(pretrained=True) model.eval() # Save the model torch.save(model.state_dict(), 'resnet18.pth') # Step 2: Create model archive (run in terminal, shown here as comment) # torch-model-archiver --model-name resnet18 --version 1.0 --serialized-file resnet18.pth --handler image_classifier --extra-files index_to_name.json # Step 3: Start TorchServe (run in terminal, shown here as comment) # torchserve --start --model-store model_store --models resnet18=resnet18.mar # Step 4: Send a test request to the server # (Make sure TorchServe is running and model is loaded) # Download a sample image url = 'https://pytorch.org/assets/images/dog.jpg' image = Image.open(requests.get(url, stream=True).raw) # Prepare image for model preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) input_tensor = preprocess(image) input_batch = input_tensor.unsqueeze(0) # create mini-batch # Load class index to name mapping with open('index_to_name.json') as f: idx_to_label = json.load(f) # Run prediction locally (for demo, since server call needs network) with torch.no_grad(): output = model(input_batch) probabilities = torch.nn.functional.softmax(output, dim=1)[0] # Get top 1 prediction top_prob, top_catid = torch.topk(probabilities, 1) print(f"Predicted class: {idx_to_label[str(top_catid.item())]} with probability {top_prob.item():.4f}")
Make sure to install TorchServe and torch-model-archiver via pip before starting.
The handler defines how input data is processed and output is returned; use built-in handlers or write your own.
Model archive files (.mar) are stored in the model store directory for TorchServe to load.
TorchServe packages PyTorch models into .mar files for easy deployment.
Use torch-model-archiver to create the model archive and torchserve to start the server.
Once running, you can send data to the server to get predictions from your model.