ONNX Runtime is a tool to run machine learning models fast and efficiently. When using it for computer vision, the key metrics are inference speed and model accuracy. Speed matters because ONNX Runtime helps models make predictions quickly, which is important for real-time tasks like object detection in videos. Accuracy matters because a fast model that makes wrong predictions is not useful. So, we want to measure how fast the model runs and how correct its predictions are.
ONNX Runtime in Computer Vision - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Confusion Matrix (Example for a 2-class image classifier):
Predicted
Cat Dog
Actual
Cat 85 15
Dog 10 90
Total samples = 85 + 15 + 10 + 90 = 200
True Positives (TP) = 85 (correctly predicted Cat)
False Positives (FP) = 10 (Dog predicted as Cat)
True Negatives (TN) = 90 (correctly predicted Dog)
False Negatives (FN) = 15 (Cat predicted as Dog)
Imagine ONNX Runtime runs a model to detect cats in photos.
- Precision means: Of all photos predicted as cats, how many really are cats? High precision means fewer false alarms.
- Recall means: Of all actual cat photos, how many did the model find? High recall means fewer missed cats.
If ONNX Runtime speeds up the model but the model misses many cats (low recall), it is not good for applications like pet monitoring. If it finds many cats but also mistakes dogs for cats (low precision), it causes false alerts.
So, ONNX Runtime helps balance speed with maintaining good precision and recall.
Good values:
- Accuracy above 90% on test images
- Precision and recall both above 85%
- Inference time reduced by 50% compared to original model
Bad values:
- Accuracy below 70%, meaning many wrong predictions
- Precision or recall below 50%, causing many false alarms or misses
- Inference speed not improved or slower, defeating ONNX Runtime's purpose
- Ignoring accuracy drop: Speeding up with ONNX Runtime may reduce accuracy if model conversion is not done carefully.
- Data leakage: Testing on data the model saw during training gives false high accuracy.
- Overfitting: Model performs well on training but poorly on new images, misleading metrics.
- Measuring only speed: Fast inference is good but useless if predictions are wrong.
Your ONNX Runtime model runs 3 times faster than the original but has 98% accuracy and only 12% recall on detecting a rare object. Is it good for production? Why or why not?
Answer: No, it is not good. Although the model is fast and has high overall accuracy, the very low recall means it misses most rare objects. For rare object detection, missing them is critical, so recall must be higher even if speed is slightly lower.
Practice
Solution
Step 1: Understand ONNX Runtime's role
ONNX Runtime is designed to run models that are already trained, not to train new ones.Step 2: Identify the correct purpose
It helps run these models efficiently on many devices, making deployment easier.Final Answer:
To run pre-trained machine learning models efficiently on different devices -> Option DQuick Check:
ONNX Runtime runs models = A [OK]
- Confusing ONNX Runtime with training frameworks
- Thinking it is for data visualization
- Assuming it collects or labels data
Solution
Step 1: Recall ONNX Runtime loading method
The correct method to load a model is using InferenceSession with the model file path.Step 2: Check each option
Only import onnxruntime as ort session = ort.InferenceSession('model.onnx') uses ort.InferenceSession correctly; others use invalid methods.Final Answer:
import onnxruntime as ort\nsession = ort.InferenceSession('model.onnx') -> Option CQuick Check:
Use InferenceSession to load model = A [OK]
- Using non-existent methods like load_model or run
- Not importing onnxruntime correctly
- Confusing model loading with running
outputs?
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession('model.onnx')
input_name = session.get_inputs()[0].name
input_data = np.random.rand(1, 3, 224, 224).astype(np.float32)
outputs = session.run(None, {input_name: input_data})
print(type(outputs))Solution
Step 1: Understand session.run output
Calling session.run returns a list of outputs from the model.Step 2: Check the print statement
Printing type(outputs) will show <class 'list'> because outputs is a list.Final Answer:
<class 'list'> -> Option AQuick Check:
session.run returns list = C [OK]
- Assuming outputs is a numpy array directly
- Thinking outputs is a dictionary
- Confusing tuple with list
import onnxruntime as ort
session = ort.InferenceSession('model.onnx')
input_name = session.get_inputs()[0]
input_data = [1.0, 2.0, 3.0]
outputs = session.run(None, {input_name: input_data})Solution
Step 1: Check input_name assignment
session.get_inputs()[0] returns an input object, but session.run expects the input name string as key.Step 2: Correct usage
Use session.get_inputs()[0].name to get the input name string for the dictionary key.Final Answer:
input_name should be the name string, not the input object -> Option AQuick Check:
Use input_name = session.get_inputs()[0].name [OK]
- Using input object instead of input name string
- Passing wrong input data types
- Misunderstanding session.run arguments
Solution
Step 1: Recall how to enable GPU in ONNX Runtime
ONNX Runtime uses the 'providers' argument with 'CUDAExecutionProvider' to run on GPU.Step 2: Check each option
Only import onnxruntime as ort session = ort.InferenceSession('model.onnx', providers=['CUDAExecutionProvider']) correctly uses providers=['CUDAExecutionProvider']; others use invalid parameters.Final Answer:
import onnxruntime as ort\nsession = ort.InferenceSession('model.onnx', providers=['CUDAExecutionProvider']) -> Option BQuick Check:
Use providers=['CUDAExecutionProvider'] for GPU [OK]
- Using non-existent parameters like device or use_gpu
- Confusing execution_mode with providers
- Not specifying providers disables GPU
