What is ONNX Runtime inference in PyTorch?

PyTorchml~5 mins

ONNX Runtime inference in PyTorch

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Introduction

ONNX Runtime lets you run machine learning models fast and on many devices. It helps you use models made in one tool inside another easily.

You want to run a PyTorch model faster on a CPU or GPU.

You need to deploy a model to a device that does not support PyTorch directly.

You want to share a model with others who use different frameworks.

You want to run the same model on different platforms like Windows, Linux, or mobile.

You want to compare performance between PyTorch and ONNX Runtime.

Syntax

PyTorch

import onnxruntime

# Load the ONNX model
session = onnxruntime.InferenceSession('model.onnx')

# Prepare input as a dictionary
inputs = {session.get_inputs()[0].name: input_array}

# Run inference
outputs = session.run(None, inputs)

You must export your PyTorch model to ONNX format first.

Input data must be a numpy array matching the model input shape.

Examples

Run inference on a random image-like input for a model expecting 1x3x224x224 input.

PyTorch

import onnxruntime
import numpy as np

session = onnxruntime.InferenceSession('model.onnx')
input_name = session.get_inputs()[0].name
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
outputs = session.run(None, {input_name: input_data})

Run inference with your own prepared input data.

PyTorch

import onnxruntime

session = onnxruntime.InferenceSession('model.onnx')
input_name = session.get_inputs()[0].name
input_data = your_numpy_array
outputs = session.run(None, {input_name: input_data})

Sample Model

This code creates a simple linear model in PyTorch, exports it to ONNX format, then loads and runs it using ONNX Runtime. It prints the input and the model's output.

PyTorch

import torch
import torch.nn as nn
import numpy as np
import onnxruntime

# Define a simple PyTorch model
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(4, 2)
    def forward(self, x):
        return self.linear(x)

# Create model and dummy input
model = SimpleModel()
model.eval()
dummy_input = torch.randn(1, 4)

# Export to ONNX
onnx_path = 'simple_model.onnx'
torch.onnx.export(model, dummy_input, onnx_path, input_names=['input'], output_names=['output'], opset_version=11)

# Prepare input for ONNX Runtime
input_data = dummy_input.numpy()

# Load ONNX model with ONNX Runtime
session = onnxruntime.InferenceSession(onnx_path)
input_name = session.get_inputs()[0].name

# Run inference
outputs = session.run(None, {input_name: input_data})

print('Input:', input_data)
print('Output:', outputs[0])

OutputSuccess

Important Notes

ONNX Runtime supports many hardware accelerators for faster inference.

Make sure the input data type and shape match the model's expected input.

ONNX Runtime can run models exported from many frameworks, not just PyTorch.

Summary

ONNX Runtime helps run machine learning models fast and on many devices.

You export your PyTorch model to ONNX format, then load it with ONNX Runtime.

Prepare input as a numpy array and run inference with session.run().